Bad foreign key matches

Use this query to return records where the Amperity ID and the foreign key are equal, but one (or more) values associated with the email, given-name, and surname semantics are not equal. Records with incorrect matches may be an indicator of overclustering.

Tip

Foreign keys are matched deterministically which can lead to incorrect matches or overclustering.

An overcluster, or a false positive, occurs when distinct records are incorrectly added to a cluster of records. Each overcluster affects the precision of identity resolution and should be investigated to understand why it occurred.

When to use

  1. Run this query when foreign keys exist.

  2. Review the results of this query (typically ~30 minutes).

  3. Look for potentially inaccurate foreign keys.

  4. Look for indicators of overclustering.

Configure query

  1. From the Queries page, open the Stitch QA folder, and then select this query.

    Tip

    Add the Stitch QA queries template folder if it does not already exist.

  2. Update the query for unique foreign keys, customer keys, or concatenated values against which comparisons will be made.

    Uncomment the following lines and replace them with the names of unique foreign keys, one pair of lines per unique foreign key or customer key:

    t1.amperity_id AS amp_id_a,
    t2.amperity_id AS amp_id_b,
    -- t1.fk AS fk_a,
    -- t2.fk AS fk_b,
    t1.email AS email_a,
    t2.email AS email_b,
    

    Use the following pattern to make comparisons against a foreign key or customer key:

    fk_name_id, CONCAT(datasource, fk)
    

    For example, for foreign keys named fk-customer-id and fk-campaign, with the latter concatenated against the EmailList data source:

    t1.amperity_id AS amp_id_a,
    t2.amperity_id AS amp_id_b,
    t1.fk_customer_id AS fk_customer_id_a,
    t2.fk_customer_id AS fk_customer_id_b,
    t1.fk_campaign AS fk_campaign_a, CONCAT(email_list, fk_campaign)
    t2.fk_campaign AS fk_campaign_b, CONCAT(email_list, fk_campaign)
    t1.email AS email_a,
    t2.email AS email_b,
    
  3. Update the INNER JOIN for the same set of foreign keys:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      -- AND t1.fk = t2.fk
      AND hf.high_freq_email IS NULL
    

    For example:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      AND fk_customer_id.fk = fk_customer_id.fk
      AND fk_campaign.fk = fk_campaign.fk
      AND hf.high_freq_email IS NULL
    
  4. Update for other semantic values against which comparisons will be made.

    Add the following pair of lines for each added semantic value:

    t1.email AS email_a,
    t2.email AS email_b,
    t1.semantic AS semantic_a,
    t2.semantic AS semantic_b,
    

    For example, to add address and loyalty (a custom semantic typically associated with loyalty programs):

    t1.email AS email_a,
    t2.email AS email_b,
    t1.address AS address_a,
    t2.address AS address_b,
    t1.loyalty AS loyalty_a,
    t2.loyalty AS loyalty_b,
    
  5. Replace 100 with the value that best represents the definition of high-frequency email:

    LEFT JOIN (
      SELECT email, TRUE AS high_freq_email FROM Unified_Coalesced GROUP BY email
      HAVING COUNT(email) > 100
    ) AS hf ON LOWER(t1.email) = LOWER(hf.email)
    

    and then set the value for high_freq_email to NULL or TRUE:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      AND hf.high_freq_email IS NULL
    
  6. Update the INNER JOIN for all foreign keys:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      -- AND t1.fk = t2.fk
      AND hf.high_freq_email IS NULL
    

    For example:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      AND fk_customer_id.fk = fk_customer_id.fk
      AND fk_campaign.fk = fk_campaign.fk
      AND hf.high_freq_email IS NULL
    
  7. Update the INNER JOIN for any conditions for which you want to explore unmatched values. These conditions typically are done for semantic fields that should not match. Use the following pattern:

    AND TRIM(LOWER(t1.unmatched_field)) <> TRIM(LOWER(t2.unmatched_field))
    

    Uncomment the following line and replace unmatched_field with the names of the field for which you want to explore unmatched values. Add a line for each field:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      AND hf.high_freq_email IS NULL
      AND TRIM(LOWER(t1.email)) <> TRIM(LOWER(t2.email))
      AND TRIM(LOWER(t1.given_name)) <> TRIM(LOWER(t2.given_name))
      AND TRIM(LOWER(t1.surname)) <> TRIM(LOWER(t2.surname))
      --AND LOWER(t1.unmatched_field) <> LOWER(t2.unmatched_field)
    

    For example, to add address and loyalty to the list of conditions:

    INNER JOIN Unified_Coalesced AS t2
      ON  t1.amperity_id = t2.amperity_id
      AND hf.high_freq_email IS NULL
      AND TRIM(LOWER(t1.email)) <> TRIM(LOWER(t2.email))
      AND TRIM(LOWER(t1.given_name)) <> TRIM(LOWER(t2.given_name))
      AND TRIM(LOWER(t1.surname)) <> TRIM(LOWER(t2.surname))
      AND TRIM(LOWER(t1.address)) <> TRIM(LOWER(t2.address))
      AND TRIM(LOWER(t1.loyalty)) <> TRIM(LOWER(t2.loyalty))
    
  8. Click Run Query and debug any issues that may arise.

  9. Click Activate.