Split clusters

Use this query to investigate patterns in split clusters that lack transitive matching for names, email addresses, and/or post office boxes, and for semantic tagging issues that may arise when using multiple, ordinal, or namespaced semantics.

When to use

  1. Run this query to identify split clusters.

  2. Test various column selections and/or versions of columns from the Unified Coalesced table.

  3. Add/update WHERE conditions to filter the initial results.

  4. Spend 30-60 minutes examining various CASE statements and filters.

  5. Adjust selection columns from the Unified Coalesced table to inspect various cases of matching PII.

  6. Look for examples of missed nicknames.

  7. Look for examples of new address keywords or formats.

Configure query

  1. From the Queries page, open the Stitch QA folder, and then select this query.

    Tip

    Add the Stitch QA queries template folder if it does not already exist.

  2. Update the common table expression for any calculated fields required by your tenant:

     1SELECT DISTINCT
     2  amperity_id
     3  ,regexp_replace(LOWER(given_name), '[^a-zA-Z]+', '') AS given_name
     4  ,regexp_replace(LOWER(surname), '[^a-zA-Z]+', '') AS surname
     5  ,split(regexp_replace(LOWER(email), '[.]+', ''), '@')[1] AS email_username
     6  -- regex_replace(LOWER(field_name), expression, '') AS calculated_field
     7  ,LOWER(email) AS email
     8  ,regexp_replace(LOWER(address), '[.]+', '') AS address
     9  ,postal
    10  ,city
    11  ,state
    12FROM Unified_Coalesced
    

    For example, to add email_domain as a calculated field:

     1SELECT DISTINCT
     2  amperity_id
     3  ,regexp_replace(LOWER(given_name), '[^a-zA-Z]+', '') AS given_name
     4  ,regexp_replace(LOWER(surname), '[^a-zA-Z]+', '') AS surname
     5  ,split(regexp_replace(LOWER(email), '[.]+', ''), '@')[1] AS email_username
     6  ,replace(lower(email), split(regexp_replace(lower(email), '\+(.*?)\@', '@'), '@')[1], '') AS email_domain
     7  ,LOWER(email) AS email
     8  ,regexp_replace(LOWER(address), '[.]+', '') AS address
     9  ,postal
    10  ,city
    11  ,state
    12FROM Unified_Coalesced
    
  3. Extend the WHERE clause in the common table expression as necessary. For example:

    1WHERE
    2  length(regexp_replace(LOWER(given_name), '[^a-zA-Z]+', '')) > 0
    3  AND length(regexp_replace(LOWER(surname), '[^a-zA-Z]+', '')) > 0
    4  AND email IS NOT NULL
    5  AND email NOT LIKE '%+%'
    
  4. Update the list of fields to contain any additional semantic or custom fields required by your tenant:

     1SELECT
     2  uc1.amperity_id AS amp_1
     3  ,uc2.amperity_id AS amp_2
     4  ,uc1.given_name AS given_1
     5  ,uc2.given_name AS given_2
     6  ,uc1.surname AS sur_1
     7  ,uc2.surname AS sur_2
     8  ,uc1.email AS email_1
     9  ,uc2.email AS email_2
    10  ,uc1.email_username AS email_username_1
    11  ,uc2.email_username AS email_username_2
    12  ,uc1.address AS adr_1
    13  ,uc2.address AS adr_2
    14  ,uc1.postal AS pos_1
    15  ,uc2.postal AS pos_2
    16  ,uc1.city AS city_1
    17  ,uc2.city AS city_2
    18  ,uc1.state AS state_1
    19  ,uc2.state AS state_2
    20  -- ADD MATCHING PAIRS FOR SEMANTICS
    

    For example, to add postal and gender:

     1SELECT
     2  uc1.amperity_id AS amp_1
     3  ,uc2.amperity_id AS amp_2
     4  ,uc1.given_name AS given_1
     5  ,uc2.given_name AS given_2
     6  ,uc1.surname AS sur_1
     7  ,uc2.surname AS sur_2
     8  ,uc1.email AS email_1
     9  ,uc2.email AS email_2
    10  ,uc1.email_username AS email_username_1
    11  ,uc2.email_username AS email_username_2
    12  ,uc1.address AS adr_1
    13  ,uc2.address AS adr_2
    14  ,uc1.postal AS pos_1
    15  ,uc2.postal AS pos_2
    16  ,uc1.city AS city_1
    17  ,uc2.city AS city_2
    18  ,uc1.state AS state_1
    19  ,uc2.state AS state_2
    20  ,uc1.postal AS postal_1
    21  ,uc2.postal AS postal_2
    22  ,uc1.gender AS gender_1
    23  ,uc2.gender AS gender_2
    
  5. Update the list of fields for all foreign keys:

    1 -- ADD MATCHING PAIRS FOR FOREIGN KEYS
    2 -- ,t1.fk_field_name AS fk_field_name_1
    3 -- ,t2.fk_field_name AS fk_field_name_2
    

    For example:

    1,t1.fk_customer_id AS fk_customer_id_1
    2,t2.fk_customer_id AS fk_customer_id_2
    3,t1.fk_campaign AS fk_campaign_1
    4,t2.fk_campaign AS fk_campaign_2
    
  6. If using the bad-values blocklist you may apply those blocklist values to the results of this query. Uncomment the following rows:

    1-- AND blv_surname IS NULL
    2-- AND blv_given_name IS NULL
    3-- AND blv_email IS NULL
    4-- AND blv_phone IS NULL
    5-- AND blv_address IS NULL
    
  7. Click Run Query and debug any issues that may arise.

  8. Click Activate.