Split clusters¶
Use this query to investigate patterns in split clusters that lack transitive matching for names, email addresses, and/or post office boxes, and for semantic tagging issues that may arise when using multiple, ordinal, or namespaced semantics.
When to use¶
Run this query to identify split clusters.
Test various column selections and/or versions of columns from the Unified Coalesced table.
Add/update WHERE conditions to filter the initial results.
Spend 30-60 minutes examining various CASE statements and filters.
Adjust selection columns from the Unified Coalesced table to inspect various cases of matching PII.
Look for examples of missed nicknames.
Look for examples of new address keywords or formats.
Configure query¶
From the Queries page, open the Stitch QA folder, and then select this query.
Tip
Add the Stitch QA queries template folder if it does not already exist.
Update the common table expression for any calculated fields required by your tenant:
1SELECT DISTINCT 2 amperity_id 3 ,regexp_replace(LOWER(given_name), '[^a-zA-Z]+', '') AS given_name 4 ,regexp_replace(LOWER(surname), '[^a-zA-Z]+', '') AS surname 5 ,split(regexp_replace(LOWER(email), '[.]+', ''), '@')[1] AS email_username 6 -- regex_replace(LOWER(field_name), expression, '') AS calculated_field 7 ,LOWER(email) AS email 8 ,regexp_replace(LOWER(address), '[.]+', '') AS address 9 ,postal 10 ,city 11 ,state 12FROM Unified_Coalesced
For example, to add email_domain as a calculated field:
1SELECT DISTINCT 2 amperity_id 3 ,regexp_replace(LOWER(given_name), '[^a-zA-Z]+', '') AS given_name 4 ,regexp_replace(LOWER(surname), '[^a-zA-Z]+', '') AS surname 5 ,split(regexp_replace(LOWER(email), '[.]+', ''), '@')[1] AS email_username 6 ,replace(lower(email), split(regexp_replace(lower(email), '\+(.*?)\@', '@'), '@')[1], '') AS email_domain 7 ,LOWER(email) AS email 8 ,regexp_replace(LOWER(address), '[.]+', '') AS address 9 ,postal 10 ,city 11 ,state 12FROM Unified_Coalesced
Extend the WHERE clause in the common table expression as necessary. For example:
1WHERE 2 length(regexp_replace(LOWER(given_name), '[^a-zA-Z]+', '')) > 0 3 AND length(regexp_replace(LOWER(surname), '[^a-zA-Z]+', '')) > 0 4 AND email IS NOT NULL 5 AND email NOT LIKE '%+%'
Update the list of fields to contain any additional semantic or custom fields required by your tenant:
1SELECT 2 uc1.amperity_id AS amp_1 3 ,uc2.amperity_id AS amp_2 4 ,uc1.given_name AS given_1 5 ,uc2.given_name AS given_2 6 ,uc1.surname AS sur_1 7 ,uc2.surname AS sur_2 8 ,uc1.email AS email_1 9 ,uc2.email AS email_2 10 ,uc1.email_username AS email_username_1 11 ,uc2.email_username AS email_username_2 12 ,uc1.address AS adr_1 13 ,uc2.address AS adr_2 14 ,uc1.postal AS pos_1 15 ,uc2.postal AS pos_2 16 ,uc1.city AS city_1 17 ,uc2.city AS city_2 18 ,uc1.state AS state_1 19 ,uc2.state AS state_2 20 -- ADD MATCHING PAIRS FOR SEMANTICS
For example, to add postal and gender:
1SELECT 2 uc1.amperity_id AS amp_1 3 ,uc2.amperity_id AS amp_2 4 ,uc1.given_name AS given_1 5 ,uc2.given_name AS given_2 6 ,uc1.surname AS sur_1 7 ,uc2.surname AS sur_2 8 ,uc1.email AS email_1 9 ,uc2.email AS email_2 10 ,uc1.email_username AS email_username_1 11 ,uc2.email_username AS email_username_2 12 ,uc1.address AS adr_1 13 ,uc2.address AS adr_2 14 ,uc1.postal AS pos_1 15 ,uc2.postal AS pos_2 16 ,uc1.city AS city_1 17 ,uc2.city AS city_2 18 ,uc1.state AS state_1 19 ,uc2.state AS state_2 20 ,uc1.postal AS postal_1 21 ,uc2.postal AS postal_2 22 ,uc1.gender AS gender_1 23 ,uc2.gender AS gender_2
Update the list of fields for all foreign keys:
1 -- ADD MATCHING PAIRS FOR FOREIGN KEYS 2 -- ,t1.fk_field_name AS fk_field_name_1 3 -- ,t2.fk_field_name AS fk_field_name_2
For example:
1,t1.fk_customer_id AS fk_customer_id_1 2,t2.fk_customer_id AS fk_customer_id_2 3,t1.fk_campaign AS fk_campaign_1 4,t2.fk_campaign AS fk_campaign_2
If using the bad-values blocklist you may apply those blocklist values to the results of this query. Uncomment the following rows:
1-- AND blv_surname IS NULL 2-- AND blv_given_name IS NULL 3-- AND blv_email IS NULL 4-- AND blv_phone IS NULL 5-- AND blv_address IS NULL
Click Run Query and debug any issues that may arise.
Click Activate.