Detailed Examples table¶

The Detailed Examples table has detailed examples of Stitch results. Use these examples to help identify which features lead to scores with the biggest effect on Stitch results, including how they associate with various combinations of fields that contain PII data.

The Detailed Examples table is automatically added to the Stitch QA database as a SQL table when it is generated using the “Stitch QA” database template.

Use with Stitch QA¶

Use the Detailed Examples table as part of the Stitch QA process to return a detailed set of examples that show you what is in your customer 360 database. Use this data to identify which types of features lead to scores with the largest effect on overall Stitch results, including how those features are associated with various combinations of fields that contain profile (PII) data.

Configure query¶

The Detailed Examples query should be configured to match the fields that are in your tenant. Open this table in the Database Editor in edit mode and make the following updates:

Uncomment rows of matching a/b pairs or add new rows so that the uncommented rows match your tenant.
Adjust case count and score count limits, if necessary.
Add customer keys (ck) if they are available and update this query if customer keys are added.
Uncomment rows for the bad-values blocklist after it is enabled for your tenant.

Column reference¶

The Detailed Examples table has the following columns, as its starting point. This table is typically updated to add more pairs, enable use for blocklisted values, and to support additional tenant-specific use cases.

Column name

Data type

Description

Amperity ID

String

The unique identifier assigned to clusters of customer profiles that all represent the same individual. The Amperity ID does not replace primary, foreign, or other unique customer keys, but exists alongside them within unified profiles.

Note

The Amperity ID is a universally unique identifier (UUID) that is 36 characters spread across five groups separated by hyphens: 8-4-4-4-12.

For example:

123e4567-e89b-12d3-a456-426614174000

Address A

Address B

String

The address connected with the location of a customer, such as “123 Main Street”.

Birthdate A

Birthdate B

String

The date of birth connected with a customer.

Case Count

String

City A

City B

String

The city connected with the location of a customer.

Country A

Country B

String

The country connected with the location of a customer.

Datasource A

Datasource B

String

The name of the data source from which this customer profile originated.

Tip

The combination of PK and Datasource uniquely identifies a row in the Unified Coalesced table, which correlates to a single row in a domain table.

Email A

Email B

String

The email address connected with a customer. A customer may have more than one email address.

Given Name A

Given Name B

String

The first name connected with a customer.

Match Category

String

A match category is a classifier that applies to an individual record-pair within a cluster of record-pairs. The match category is the result of this classification.

Match Category	Description
Exact	Amperity has the highest confidence that these records represent the same person because all profile data exactly matches.
Excellent	Amperity has near perfect confidence that these records belong to the same person, despite select types of profile data not matching.
High	Using deductive reasoning, Amperity has high confidence that these records match, despite some profile data not matching.
Moderate	Amperity has moderate confidence that these records match, due to weak or fuzzy matches between unique customer attributes, such as email, phone, or address.
Weak	Amperity lacks confidence, but if asked to guess, Amperity would assert these records do belong to the same individual, because they match on non-unique customer attributes, such as name, state, ZIP code.
Non-match	Amperity has high confidence that these records do NOT match, because core profile data is in conflict.

Also in: Unified Scores

Phone A

Phone B

String

The phone number connected with a customer. A customer may have more than one phone number.

PK A

PK B

String

The combination of data source and primary key allows Amperity to uniquely identify every row in every data table across the entirety of customer data input to Amperity.

Tip

The combination of PK and Datasource uniquely identifies a row in the Unified Coalesced table, which correlates to a single row in a domain table.

Postal A

Postal B

String

The ZIP code or postal code connected with the location of a customer.

Score

Float

A score has a value from “0.0” to “5.0” that represents the combined score assigned to the record pair by Stitch. A score has two parts: the score is on the left side and the score’s strength is on the right.

The record pair score correlates to the match category, which is a classifier applied by Amperity to individual record pairs. The record pair score corresponds to the classification:

5 for exact matches
4 for excellent matches
3 for high matches
2 for moderate matches
1 for weak matches
0 for no matches

The record pair strength represents the strength of the record pair score. It is a two digit number. For example: .31 is a lower strength and .93 is a higher strength.

Note

Scores are shown for records that end up in the same cluster, including any scores that are below threshold. Scores are not shown for records that do not end up in the same cluster.

Also in: Unified Scores

Score Count

String

State A

State B

String

The state or province connected with the location of a customer.

Surname A

Surname B

String

The last name connected with a customer.

SQL reference¶

WITH Filtered_Scores AS (
  SELECT
    score
    ,COUNT(1) AS c
    ,500/COUNT(1) AS p
  FROM Unified_Scores
  GROUP BY score HAVING COUNT(1) > 10000
),
Sampled_Pairs AS (
  SELECT
    us.amperity_id
    ,us.match_category
    ,us.source1
    ,us.source2
    ,us.pk1
    ,us.pk2
    ,us.score
  FROM Filtered_Scores fs
  JOIN Unified_Scores us
  ON us.score = fs.score
  WHERE rand() < p
)

SELECT
  scores.match_category
  ,scores.score
  ,fs.c AS score_count
  ,scores.amperity_id
  ,t1.datasource AS datasource_a
  ,t2.datasource AS datasource_b
  ,t1.given_name AS given_name_a
  ,t2.given_name AS given_name_b
  ,t1.surname AS surname_a
  ,t2.surname AS surname_b
  -- UNCOMMENT BELOW IF full_name IS AVAILABLE
  -- ,t1.full_name AS full_name_a
  -- ,t2.full_name AS full_name_b
  -- UNCOMMENT BELOW IF generational_suffix IS AVAILABLE
  -- ,t1.generational_suffix AS generational_suffix_a
  -- ,t2.generational_suffix AS generational_suffix_b
  -- UNCOMMENT BELOW IF title IS AVAILABLE
  -- ,t1.title AS title_a
  -- ,t2.title AS title_b
  ,t1.email AS email_a
  ,t2.email AS email_b
  ,t1.phone AS phone_a
  ,t2.phone AS phone_b
  ,t1.address AS address_a
  ,t2.address AS address_b
  -- UNCOMMENT BELOW IF address2 IS AVAILABLE
  -- ,t1.address2 AS address2_a
  -- ,t2.address2 AS address2_b
  ,t1.birthdate AS birthdate_a
  ,t2.birthdate AS birthdate_b
  ,t1.city AS city_a
  ,t2.city AS city_b
  ,t1.state AS state_a
  ,t2.state AS state_b
  ,t1.country AS country_a
  ,t2.country AS country_b
  ,t1.postal AS postal_a
  ,t2.postal AS postal_b
  -- UNCOMMENT BELOW IF company IS AVAILABLE
  -- ,t1.company AS company_a
  -- ,t2.company AS company_b
  -- UNCOMMENT BELOW IF gender IS AVAILABLE
  -- ,t1.gender AS gender_a
  -- ,t2.gender AS gender_b
  ,t1.pk AS pk_a
  ,t2.pk AS pk_b
  -- UNCOMMENT BELOW IF ck IS AVAILABLE
  -- ,t1.ck AS ck_a
  -- ,t2.ck AS ck_b
  -- OPTIONAL: UNCOMMENT FOR BAD-VALUES BLACKLIST
  -- ,t1.has_blv AS has_blv_a
  -- ,t2.has_blv AS has_blv_b
  -- ,t1.blv_given_name AS blv_given_name_a
  -- ,t2.blv_given_name AS blv_given_name_b
  -- ,t1.blv_surname AS blv_surname_a
  -- ,t2.blv_surname AS blv_surname_b
  -- ,t1.blv_email AS blv_email_a
  -- ,t2.blv_email AS blv_email_b
  -- ,t1.blv_phone AS blv_phone_a
  -- ,t2.blv_phone AS blv_phone_b
  -- ,t1.blv_address AS blv_address_a
  -- ,t2.blv_address AS blv_address_b
  -- USE AS DESIRED FOR namespaced fks
  -- ,t1.fk_field_name AS fk_field_name_a
  -- ,t2.fk_field_name AS fk_field_name_b
  ,ROW_NUMBER() OVER (PARTITION BY scores.score
                      ORDER BY scores.amperity_id, scores.pk1, scores.pk2
                      ) AS case_count
  -- ,ROW_NUMBER() OVER (PARTITION BY scores.amperity_id
                         ORDER BY scores.amperity_id, scores.pk1, scores.pk2
                         ) AS amperity_id_count
FROM Sampled_Pairs AS scores
LEFT JOIN Filtered_Scores
  AS fs ON fs.score = scores.score
LEFT JOIN Unified_Coalesced
  AS t1 ON scores.pk1 = t1.pk AND scores.source1 = t1.datasource
LEFT JOIN Unified_Coalesced
  AS t2 ON scores.pk2 = t2.pk AND scores.source2 = t2.datasource
-- OPTIONAL: UNCOMMENT FOR BAD-VALUES BLOCKLIST
-- AND has_blv IS NULL
-- AND blv_surname IS NULL
-- AND blv_given_name IS NULL
-- AND blv_email IS NULL
-- AND blv_phone IS NULL
-- AND blv_address IS NULL