Detailed Examples table

The Detailed Examples table contains detailed examples of Stitch results. Use these examples to help identify which features lead to scores with the biggest effect on overall Stitch results, including how they are associated with various combinations of fields that contain PII data.

The Detailed Examples table is automatically added to the Stitch QA database as a SQL table when it is generated using the “Stitch QA” database template.

Use with Stitch QA

Use the Detailed Examples table as part of the Stitch QA process to return a detailed set of examples that show you what is in your customer 360 database. Use this data to identify which types of features lead to scores with the largest effect on overall Stitch results, including how those features are associated with various combinations of fields that contain profile (PII) data.

Configure query

The Detailed Examples query should be configured to match the fields that are in your tenant. Open this table in the Database Editor in edit mode and make the following updates:

  1. Uncomment rows of matching a/b pairs or add new rows so that the uncommented rows match your tenant.

  2. Adjust case count and score count limits, if necessary.

  3. Add customer keys (ck) if they are available (and update this query if customer keys are added).

  4. Uncomment rows for the bad-values blocklist after it is enabled for your tenant.

Column reference

The Detailed Examples table contains the following columns, as its starting point. This table is typically updated to add more pairs, enable use for blocklisted values, and to support additional tenant-specific use cases.

Column name

Data type

Description

Amperity ID

String

The unique identifier that is assigned to clusters of customer records that all represent the same individual. The Amperity ID does not replace primary and foreign keys, but exists alongside them within unified profiles.

Note

The Amperity ID is a universally unique identifier (UUID) that is represented by 36 characters spread across five groups separated by hyphens: 8-4-4-4-12.

For example:

123e4567-e89b-12d3-a456-426614174000

Address A

Address B

String

The address that is associated with the location of an individual customer record. For example: 123 Main Street.

Birthdate A

Birthdate B

String

The date of birth that is associated with an individual customer record.

Case Count

String

City A

City B

String

The city that is associated with the location of an individual customer record.

Country A

Country B

String

The country that is associated with the location of an individual customer record.

Datasource A

Datasource B

String

The name of the data source from which this customer record originated.

Tip

The combination of PK and Datasource uniquely identifies a row in the Unified Coalesced table, which correlates to a single row in a domain table.

Email A

Email B

String

The email address that is associated with an individual customer record. A customer record may be associated with multiple email addresses.

Given Name A

Given Name B

String

The first name that is associated with an individual customer record.

Match Category

String

A match category is a classifier that is applied by Amperity to an individual record-pair within a cluster of record-pairs. The match category is the result of this classification.

Match Category

Description

Exact

Amperity has the highest confidence that these records represent the same person because all profile data exactly matches.

Excellent

Amperity has near perfect confidence that these records belong to the same person, despite select types of profile data not matching.

High

Using deductive reasoning, Amperity has very high confidence that these records match, despite some profile data not matching.

Moderate

Amperity has moderate confidence that these records match, due to weak or fuzzy matches between highly unique customer attributes (email, phone, address).

Weak

Amperity lacks confidence, but if asked to guess, Amperity would assert these records do belong to the same individual, because they match on non-unique customer attributes (name, state, zip code).

No conflict

Amperity has high confidence that these records do NOT match, because core profile data is in conflict.

Also in: Unified Scores

Phone A

Phone B

String

The phone number that is associated with an individual customer record. A customer record may be associated with multiple phone numbers.

PK A

PK B

String

The combination of data source and primary key allows Amperity to uniquely identify every row in every data table across the entirety of customer data input to Amperity.

Tip

The combination of PK and Datasource uniquely identifies a row in the Unified Coalesced table, which correlates to a single row in a domain table.

Postal A

Postal B

String

The zip code or postal code that is associated with the location of an individual customer record.

Score

Float

A score contains a value from 0.0 to 5.0 that represents the combined score assigned to the record pair by Stitch. There are two components of the score: the score itself, and then its strength.

The record pair score correlates to the match category, which is a classifier that is applied by Amperity to individual record pairs. The record pair score corresponds to the classification: 5 for exact matches, 4 for excellent matches, 3 for high matches, 2 for moderate matches, 1 for weak matches, and 0 for no matches.

The record pair strength represents the strength of the record pair score. It is a two digit number. For example: .31 is a lower strength and .93 is a very high strength.

Note

Scores are shown for records that end up in the same cluster, including any scores that are below threshold. Scores are not shown for records that do not end up in the same cluster.

Also in: Unified Scores

Score Count

String

State A

State B

String

The state or province that is associated with the location of an individual customer record.

Surname A

Surname B

String

The last name that is associated with an individual customer record.

SQL reference

WITH Filtered_Scores AS (
  SELECT
    score
    ,COUNT(1) AS c
    ,500/COUNT(1) AS p
  FROM Unified_Scores
  GROUP BY score HAVING COUNT(1) > 10000
),
Sampled_Pairs AS (
  SELECT
    us.amperity_id
    ,us.match_category
    ,us.source1
    ,us.source2
    ,us.pk1
    ,us.pk2
    ,us.score
  FROM Filtered_Scores fs
  JOIN Unified_Scores us
  ON us.score = fs.score
  WHERE rand() < p
)

SELECT
  scores.match_category
  ,scores.score
  ,fs.c AS score_count
  ,scores.amperity_id
  ,t1.datasource AS datasource_a
  ,t2.datasource AS datasource_b
  ,t1.given_name AS given_name_a
  ,t2.given_name AS given_name_b
  ,t1.surname AS surname_a
  ,t2.surname AS surname_b
  -- UNCOMMENT BELOW IF full_name IS AVAILABLE
  -- ,t1.full_name AS full_name_a
  -- ,t2.full_name AS full_name_b
  -- UNCOMMENT BELOW IF generational_suffix IS AVAILABLE
  -- ,t1.generational_suffix AS generational_suffix_a
  -- ,t2.generational_suffix AS generational_suffix_b
  -- UNCOMMENT BELOW IF title IS AVAILABLE
  -- ,t1.title AS title_a
  -- ,t2.title AS title_b
  ,t1.email AS email_a
  ,t2.email AS email_b
  ,t1.phone AS phone_a
  ,t2.phone AS phone_b
  ,t1.address AS address_a
  ,t2.address AS address_b
  -- UNCOMMENT BELOW IF address2 IS AVAILABLE
  -- ,t1.address2 AS address2_a
  -- ,t2.address2 AS address2_b
  ,t1.birthdate AS birthdate_a
  ,t2.birthdate AS birthdate_b
  ,t1.city AS city_a
  ,t2.city AS city_b
  ,t1.state AS state_a
  ,t2.state AS state_b
  ,t1.country AS country_a
  ,t2.country AS country_b
  ,t1.postal AS postal_a
  ,t2.postal AS postal_b
  -- UNCOMMENT BELOW IF company IS AVAILABLE
  -- ,t1.company AS company_a
  -- ,t2.company AS company_b
  -- UNCOMMENT BELOW IF gender IS AVAILABLE
  -- ,t1.gender AS gender_a
  -- ,t2.gender AS gender_b
  ,t1.pk AS pk_a
  ,t2.pk AS pk_b
  -- UNCOMMENT BELOW IF ck IS AVAILABLE
  -- ,t1.ck AS ck_a
  -- ,t2.ck AS ck_b
  -- OPTIONAL: UNCOMMENT FOR BAD-VALUES BLACKLIST
  -- ,t1.has_blv AS has_blv_a
  -- ,t2.has_blv AS has_blv_b
  -- ,t1.blv_given_name AS blv_given_name_a
  -- ,t2.blv_given_name AS blv_given_name_b
  -- ,t1.blv_surname AS blv_surname_a
  -- ,t2.blv_surname AS blv_surname_b
  -- ,t1.blv_email AS blv_email_a
  -- ,t2.blv_email AS blv_email_b
  -- ,t1.blv_phone AS blv_phone_a
  -- ,t2.blv_phone AS blv_phone_b
  -- ,t1.blv_address AS blv_address_a
  -- ,t2.blv_address AS blv_address_b
  -- USE AS DESIRED FOR namespaced fks
  -- ,t1.fk_field_name AS fk_field_name_a
  -- ,t2.fk_field_name AS fk_field_name_b
  ,ROW_NUMBER() OVER (PARTITION BY scores.score
                      ORDER BY scores.amperity_id, scores.pk1, scores.pk2
                      ) AS case_count
  -- ,ROW_NUMBER() OVER (PARTITION BY scores.amperity_id
                         ORDER BY scores.amperity_id, scores.pk1, scores.pk2
                         ) AS amperity_id_count
FROM Sampled_Pairs AS scores
LEFT JOIN Filtered_Scores
  AS fs ON fs.score = scores.score
LEFT JOIN Unified_Coalesced
  AS t1 ON scores.pk1 = t1.pk AND scores.source1 = t1.datasource
LEFT JOIN Unified_Coalesced
  AS t2 ON scores.pk2 = t2.pk AND scores.source2 = t2.datasource
-- OPTIONAL: UNCOMMENT FOR BAD-VALUES BLOCKLIST
-- AND has_blv IS NULL
-- AND blv_surname IS NULL
-- AND blv_given_name IS NULL
-- AND blv_email IS NULL
-- AND blv_phone IS NULL
-- AND blv_address IS NULL