Detailed Examples table¶
The Detailed Examples table contains detailed examples of Stitch results. Use these examples to help identify which features lead to scores with the biggest effect on overall Stitch results, including how they are associated with various combinations of fields that contain PII data.
The Detailed Examples table is automatically added to the Stitch QA database as a SQL table when it is generated using the “Stitch QA” database template.
Use with Stitch QA¶
Use the Detailed Examples table as part of the Stitch QA process to return a detailed set of examples that show you what is in your customer 360 database. Use this data to identify which types of features lead to scores with the largest effect on overall Stitch results, including how those features are associated with various combinations of fields that contain profile (PII) data.
Configure query¶
The Detailed Examples query should be configured to match the fields that are in your tenant. Open this table in the Database Editor in edit mode and make the following updates:
Uncomment rows of matching a/b pairs or add new rows so that the uncommented rows match your tenant.
Adjust case count and score count limits, if necessary.
Add customer keys (ck) if they are available (and update this query if customer keys are added).
Uncomment rows for the bad-values blocklist after it is enabled for your tenant.
Column reference¶
The Detailed Examples table contains the following columns, as its starting point. This table is typically updated to add more pairs, enable use for blocklisted values, and to support additional tenant-specific use cases.
Column name |
Data type |
Description |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Amperity ID |
String |
The unique identifier that is assigned to clusters of customer records that all represent the same individual. The Amperity ID does not replace primary and foreign keys, but exists alongside them within unified profiles. Note The Amperity ID is a universally unique identifier (UUID) that is represented by 36 characters spread across five groups separated by hyphens: 8-4-4-4-12. For example: 123e4567-e89b-12d3-a456-426614174000
|
||||||||||||||
Address A Address B |
String |
The address that is associated with the location of a customer, such as “123 Main Street”. |
||||||||||||||
Birthdate A Birthdate B |
String |
The date of birth that is associated with a customer. |
||||||||||||||
Case Count |
String |
|||||||||||||||
City A City B |
String |
The city that is associated with the location of a customer. |
||||||||||||||
Country A Country B |
String |
The country that is associated with the location of a customer. |
||||||||||||||
Datasource A Datasource B |
String |
The name of the data source from which this customer record originated. Tip The combination of PK and Datasource uniquely identifies a row in the Unified Coalesced table, which correlates to a single row in a domain table. |
||||||||||||||
Email A Email B |
String |
The email address that is associated with a customer. A customer may have more than one email address. |
||||||||||||||
Given Name A Given Name B |
String |
The first name that is associated with a customer. |
||||||||||||||
Match Category |
String |
A match category is a classifier that is applied by Amperity to an individual record-pair within a cluster of record-pairs. The match category is the result of this classification.
Also in: Unified Scores |
||||||||||||||
Phone A Phone B |
String |
The phone number that is associated with a customer. A customer may have more than one phone number. |
||||||||||||||
PK A PK B |
String |
The combination of data source and primary key allows Amperity to uniquely identify every row in every data table across the entirety of customer data input to Amperity. Tip The combination of PK and Datasource uniquely identifies a row in the Unified Coalesced table, which correlates to a single row in a domain table. |
||||||||||||||
Postal A Postal B |
String |
The zip code or postal code that is associated with the location of a customer. |
||||||||||||||
Score |
Float |
A score contains a value from 0.0 to 5.0 that represents the combined score assigned to the record pair by Stitch. There are two components of the score: the score itself, and then its strength. The record pair score correlates to the match category, which is a classifier that is applied by Amperity to individual record pairs. The record pair score corresponds to the classification: 5 for exact matches, 4 for excellent matches, 3 for high matches, 2 for moderate matches, 1 for weak matches, and 0 for no matches. The record pair strength represents the strength of the record pair score. It is a two digit number. For example: .31 is a lower strength and .93 is a very high strength. Note Scores are shown for records that end up in the same cluster, including any scores that are below threshold. Scores are not shown for records that do not end up in the same cluster. Also in: Unified Scores |
||||||||||||||
Score Count |
String |
|||||||||||||||
State A State B |
String |
The state or province that is associated with the location of a customer. |
||||||||||||||
Surname A Surname B |
String |
The last name that is associated with a customer. |
SQL reference¶
WITH Filtered_Scores AS (
SELECT
score
,COUNT(1) AS c
,500/COUNT(1) AS p
FROM Unified_Scores
GROUP BY score HAVING COUNT(1) > 10000
),
Sampled_Pairs AS (
SELECT
us.amperity_id
,us.match_category
,us.source1
,us.source2
,us.pk1
,us.pk2
,us.score
FROM Filtered_Scores fs
JOIN Unified_Scores us
ON us.score = fs.score
WHERE rand() < p
)
SELECT
scores.match_category
,scores.score
,fs.c AS score_count
,scores.amperity_id
,t1.datasource AS datasource_a
,t2.datasource AS datasource_b
,t1.given_name AS given_name_a
,t2.given_name AS given_name_b
,t1.surname AS surname_a
,t2.surname AS surname_b
-- UNCOMMENT BELOW IF full_name IS AVAILABLE
-- ,t1.full_name AS full_name_a
-- ,t2.full_name AS full_name_b
-- UNCOMMENT BELOW IF generational_suffix IS AVAILABLE
-- ,t1.generational_suffix AS generational_suffix_a
-- ,t2.generational_suffix AS generational_suffix_b
-- UNCOMMENT BELOW IF title IS AVAILABLE
-- ,t1.title AS title_a
-- ,t2.title AS title_b
,t1.email AS email_a
,t2.email AS email_b
,t1.phone AS phone_a
,t2.phone AS phone_b
,t1.address AS address_a
,t2.address AS address_b
-- UNCOMMENT BELOW IF address2 IS AVAILABLE
-- ,t1.address2 AS address2_a
-- ,t2.address2 AS address2_b
,t1.birthdate AS birthdate_a
,t2.birthdate AS birthdate_b
,t1.city AS city_a
,t2.city AS city_b
,t1.state AS state_a
,t2.state AS state_b
,t1.country AS country_a
,t2.country AS country_b
,t1.postal AS postal_a
,t2.postal AS postal_b
-- UNCOMMENT BELOW IF company IS AVAILABLE
-- ,t1.company AS company_a
-- ,t2.company AS company_b
-- UNCOMMENT BELOW IF gender IS AVAILABLE
-- ,t1.gender AS gender_a
-- ,t2.gender AS gender_b
,t1.pk AS pk_a
,t2.pk AS pk_b
-- UNCOMMENT BELOW IF ck IS AVAILABLE
-- ,t1.ck AS ck_a
-- ,t2.ck AS ck_b
-- OPTIONAL: UNCOMMENT FOR BAD-VALUES BLACKLIST
-- ,t1.has_blv AS has_blv_a
-- ,t2.has_blv AS has_blv_b
-- ,t1.blv_given_name AS blv_given_name_a
-- ,t2.blv_given_name AS blv_given_name_b
-- ,t1.blv_surname AS blv_surname_a
-- ,t2.blv_surname AS blv_surname_b
-- ,t1.blv_email AS blv_email_a
-- ,t2.blv_email AS blv_email_b
-- ,t1.blv_phone AS blv_phone_a
-- ,t2.blv_phone AS blv_phone_b
-- ,t1.blv_address AS blv_address_a
-- ,t2.blv_address AS blv_address_b
-- USE AS DESIRED FOR namespaced fks
-- ,t1.fk_field_name AS fk_field_name_a
-- ,t2.fk_field_name AS fk_field_name_b
,ROW_NUMBER() OVER (PARTITION BY scores.score
ORDER BY scores.amperity_id, scores.pk1, scores.pk2
) AS case_count
-- ,ROW_NUMBER() OVER (PARTITION BY scores.amperity_id
ORDER BY scores.amperity_id, scores.pk1, scores.pk2
) AS amperity_id_count
FROM Sampled_Pairs AS scores
LEFT JOIN Filtered_Scores
AS fs ON fs.score = scores.score
LEFT JOIN Unified_Coalesced
AS t1 ON scores.pk1 = t1.pk AND scores.source1 = t1.datasource
LEFT JOIN Unified_Coalesced
AS t2 ON scores.pk2 = t2.pk AND scores.source2 = t2.datasource
-- OPTIONAL: UNCOMMENT FOR BAD-VALUES BLOCKLIST
-- AND has_blv IS NULL
-- AND blv_surname IS NULL
-- AND blv_given_name IS NULL
-- AND blv_email IS NULL
-- AND blv_phone IS NULL
-- AND blv_address IS NULL