Merged Customers table

The Merged Customers table is a data table that must be added to a customer 360 database. The purpose of the Merged Customers table is to collect rows from the Unified Coalesced table, and then collapse those into a single row per Amperity ID. Columns in the Merged Customers table are made available to segmentation via the Customer 360 table.

Tip

This topic describes a recommended starting point for the Merged Customers table.

The Merged Customers table collects PII data from the Unified Coalesced table, and then performs additional processing and grouping of this data prior to making this data available to the Customer 360 table.

Use the Merged Customers tables to apply bad-values blocklists to merge rules.

The Merged Customers table is added to the customer 360 database when you create that database using the database template. You may build the Merged Customers table by hand using the Merged_Customers template .

This topic describes the starting point for the Merged Customers table, and then steps through the process of updating it to be specific to your tenant. (This topic does not attempt to address all of the specific use cases you may have for your tenant.)

Common ways of extending this table to support additional use cases are described, along with providing links to more detailed examples, when available. Start with the SQL that is added to your tenant by the “Customer 360” database template, and then add support for all of your required use cases.

Requirements

This topic assumes the following requirements are met:

  • PII semantic tags are applied consistently, including assigning the correct data types, to all feeds that contain customer records. For custom PII semantics and/or non-PII data that must be accessible from the Unified Coalesced table, you may need to extend the Merged Customers table to support them.

  • Feeds that contain customer records are made available to Stitch.

  • Each feed has a primary key; foreign keys are applied consistently across all feeds that are made available to Stitch.

  • At least one feed must contain a field that indicates when the data was last updated.

    Apply the update-dt semantic tag to columns in customer records that identify when the data was last updated. The field to which this semantic is applied must be a datetime field type. At least one customer record must have this semantic tag applied to ensure that the update_dt column is created in the Unified Coalesced table and to ensure that the Merged Customers table behaves correctly.

  • You can extend the Merged Customers table to exclude common or “bad” values. Extensions may be done after the Merged Customers table is created because they often have both downstream and upstream dependencies within the Amperity workflow that will require some tuning within for the Merged Customers table.

Add table

The Merged Customers table is a data table that must be added to a customer 360 database. The purpose of the Merged Customers table is to collect rows from the Unified Coalesced table, and then collapse those into a single row per Amperity ID. Columns in the Merged Customers table are made available to segmentation via the Customer 360 table.

Note

The Merged Customers table uses a Spark SQL query to pull the Amperity ID, along with all PII data, from the Unified Coalesced table.

The Unified Coalesced table contains all the PII data that has been processed through Stitch. The data is organized by the semantic tag as the column heading and then coalesced into one single table. A unique Amperity ID may appear in more than one row.

To add the Merged Customers table

  1. From the Customer 360 page, under All Databases, select the menu for the customer 360 database, and then click Edit.

  2. From the Database Editor, click Add Table.

  3. Name the table “Merged_Customers”.

  4. Set the build mode to SQL.

  5. Click Apply template, and then select Merged Customers.

  6. Update the placeholder names (shown as “Domain:Table” in the query) and set field-level priorities.

  7. Click Validate to verify the SQL runs without error.

  8. Click Next. This opens the Database Table Definition page.

  9. Add a table description. This enables a tooltip that is visible from other areas in Amperity.

  10. Verify that the db/required and db/unique database field semantics were applied to the amperity_id column.

  11. Verify that semantic tags—given_name, surname, email, phone, address, city, state, postal, birthdate, gender, etc.—were applied to all PII fields correctly.

    Tip

    You can clear incorrectly tagged semantics individually or for the entire table. To clear semantic tags for the entire table, under Semantic Suggestions click Clear semantics.

  12. From the Table Semantics drop-down, select Merged Customers.

  13. Under Version History, select Enable table version history.

  14. Click Save.

SQL query

The following SQL query is the recommended starting point for the Merged Customers table. It exists in two parts: a window function that collects and groups PII data, and then a statement that updates the Merged Customers table with the results. This query does the following:

  1. Provides a location in which domain tables (i.e. “feeds that are made available to Stitch”) are assigned priority. (This section must be updated to contain the names of the domain tables for your tenant that should be assigned a non-default priority.)

  2. Provides a location in which PII fields are assigned priority. (This section must be updated to contain the names of the domain tables that are assigned source priority, and then be updated to assign priorities to logical groups of fields that contain PII data, e.g. email addresses, physical addresses, and so on.)

  3. Left joins the fields in the Unified Coalesced table into a temporary table for prioritization.

  4. Left joins the fields in the Unified Coalesced table to form groupings of PII fields. For example, all address-related fields (address, address2, city, state, postal, and country) are grouped together. This is done for names, addresses, phone numbers, email addresses, birthdates, and gender.

  5. Left joins the fields in the Unified Coalesced table by logical groups and by assigned priority.

  6. Adds a set of fields to the temporary table for each of the logical groups, which are then added to the Merged Customers table.

  7. Refreshes the data in the Merged Customers table.

Refer to the required updates and the example of updates for more information about how to apply the recommended query to the specific domain tables in your tenant.

Required updates

The recommended starting point for the Merged Customers table requires the following updates:

  • Update the list of domain tables under Source_Priority to contain at least one domain table with fields tagged for PII semantics.

  • Update the list of domain tables under Field_Priority to contain the same list of domain tables, and then specify individual field priorities across all tables. The order of the fields under each Domain:Table must match the order of the fields under fp_datasource.

Source priority

Update the list of domain tables under Source_Priority to contain at least one domain table with fields tagged for PII semantics. Only domain tables that are made available to Stitch may be listed in the source priority. An integer value will assign priority, where “1” has a higher priority than “2”. Domain tables may be assigned the same priority. Any domain table not specified here is assigned a default priority of “999”.

WITH
  Source_Priority AS (
    SELECT *
    FROM VALUES
      ("Domain:Table", x)
      ,("Domain:Table", x)
    AS (sp_datasource, priority)
  )

Field priority

Use the field priority table to assign priorities for individual fields that are different from the priority assigned generally to each source domain table Source_Priority. Update the list of tables to match the same domain tables that were specified for Source_Priority, and then update the field priority table to specify individual priorities.

  • A NULL value will use the priority assigned to the source domain table or, when a priority is not assigned by a source domain table, the default priority of “999”.

  • An integer value will assign priority, where “1” has a higher priority than “2”.

,Field_Priority AS (
  SELECT *
  FROM VALUES
    ("Domain:Table", null, null, null, null, null, null)
    ,("Domain:Table", null, null, null, null, null, null)
  AS (fp_datasource, name_pri, address_pri, email_pri, phone_pri, birthdate_pri, gender_pri)

Caution

The order of the fields under each Domain:Table matches the order of the fields under fp_datasource. Be sure to apply the priority in the correct location. For example, assigning field priority for email addresses:

,null      ,null         ,1          ,null       ,null           ,null

,null      ,null         ,2          ,null       ,null           ,null

,name_pri  ,address_pri  ,email_pri  ,phone_pri  ,birthdate_pri  ,gender_pri

Example of updates

The following example shows three domain tables: Table A, Table B, and Table C, with source priority assigned in the same order. The email address in Table B is assigned the highest priority, the address and phone in Table C is assigned the highest priority, and the name is assigned the highest priority in all three tables.

The database in this tenant also includes three other tables: Table D, Table E, and Table F. These tables contain fields tagged with PII semantics, but none of them are higher priority than Table A, Table B, or Table C. By not adding Table D, Table E, and Table F to the SQL query for the Merged Customers table, these tables will be automatically assigned the default priority of 999.

WITH
  Source_Priority AS (
    SELECT *
    FROM VALUES
      ("Table:A",  1)
      ,("Table:B", 2)
      ,("Table:C", 3)
    AS (sp_datasource, priority)
  )

  ,Field_Priority AS (
    SELECT *
    FROM VALUES
      ("Table:A",1,2,2,2,null,null)
      ,("Table:B",1,2,1,2,null,null)
      ,("Table:C",1,1,2,1,null,null)
    AS (fp_datasource, name_pri, address_pri, email_pri, phone_pri, birthdate_pri, gender_pri)
  )

Best email address

The Email Ampid Assignment table narrows the many-to-many relationships between Amperity IDs and email addresses down to one-to-one relationships, and then makes the one-to-one relationships available to the Merged Customers table to help your brand identify the best email address for each of your customers.

The Email Ampid Assignment table is configured by default in the SQL template for the Merged Customers table. It is recommended that your tenant use the Email Ampid Assignment table to ensure that downstream workflows can access the best email addresses for each Amperity ID.

Note

The Email Ampid Assignment table does not prevent Amperity IDs from being associated with multiple email addresses or prevent email addresses from being associated with multiple Amperity IDs.

This table enforces a one-to-one relationship for use in the Merged Customers table.

Many-to-many relationships exist and are accessible from the Unified Coalesced, All Opted-in Emails, and Email Opt Status tables.

Using the Email Ampid Assignment table is strongly recommended, but may be byassed when email opt status and/or email engagement data is not available in your tenant.

If your tenant chooses to bypass the Email Ampid Assignment table, the following steps are required to update the Merged Customers table to use email address priority and completion values that do not rely on the Email Ampid Assignment table.

  1. Add NAMED_STRUCT for email addresses

  2. Add email address priority

  3. Replace email completion

  4. Undo LEFT JOIN

Important

The previous steps are only necessary when the Email Ampid Assignment table is not configured for your tenant.

Add NAMED_STRUCT for email addresses

When your tenant is not using the Email Ampid Assignment table you must replace the following comment:

-- NAMED_STRUCT block for email addresses goes here if your
-- tenant is not using the Email_Ampid_Assignment table.

with:

,NAMED_STRUCT(
  'email', email
  ,'pk', pk
  ,'update_dt', update_dt
  ,'datasource', datasource
  ,'priority', COALESCE(fp.email_pri, up.priority, 999)
  ,'completion', INT(ISNOTNULL(email))
) AS `email_struct`

This should be placed in-between the NAMED_STRUCT blocks for address and phone. Refer to the recommended starting point in this topic to refernce the position of this block within the SQL template for the Merged Customers table.

Add email address priority

When your tenant is not using the Email Ampid Assignment table you must replace the following comment:

-- FIRST block for email priority goes here if your
-- tenant is not using the Email_Ampid_Assignment table.

with:

,FIRST(email_struct)
  OVER (
    PARTITION BY amperity_id
    ORDER BY email_struct.completion DESC
             ,email_struct.priority
             ,email_struct.update_dt DESC
             ,email_struct.pk
  ) AS email_struct

This should be placed in-between the FIRST blocks for address and phone. Refer to the recommended starting point in this topic to refernce the position of this block within the SQL template for the Merged Customers table.

Note

If your tenant is using the bad-values blocklist you must also add the following line:

,email_struct.blv

Replace email completion

When your tenant is not using the Email Ampid Assignment table you must replace the following SQL:

-- Get email completion from Email Ampid Assignment table
,email_ampid_assignment.email
,CASE
  WHEN email_ampid_assignment.email IS NULL
  THEN 0
  ELSE 1
END AS email_completion

with:

,up.email_struct.email
,up.email_struct.pk AS `email_pk`
,up.email_struct.update_dt AS `email_update_dt`
,up.email_struct.datasource AS `email_datasource`
,up.email_struct.priority AS `email_priority`
,up.email_struct.completion AS `email_completion`

Note

If your tenant is using the bad-values blocklist you must also add the following line:

,up.email_struct.blv AS `email_blv`

Undo LEFT JOIN

When your tenant is not using the Email Ampid Assignment table you must remove the following LEFT JOIN at the bottom of the SQL template:

LEFT JOIN email_ampid_assignment
ON email_ampid_assignment.amperity_id = up.amperity_id

Extend PII semantics

You can extend the Merged Customers table to preprocess fields that contain PII semantics to improve the quality of the data that ends up in the Customer 360 table:

Add custom semantics

You can create custom semantics for PII and non-PII fields. All custom semantics are added to the Unified Coalesced table.

  1. Apply the custom semantics to to feeds. Ensure each custom semantic uses a consistent data type across all feeds.

  2. Extend the Merged Customers table with custom merge rules that load the column created by the custom semantic from the Unified Coalesced table.

  3. Apply similar merge patterns used for PII semantics.

Within the Merged Customers table, you must decide if the custom semantic requires source and/or field priority:

For example, a company may have data sources that provide data about its own employees, such as internal email addresses, phone numbers, and so on. Use a custom PII semantic named email-internal to tag the fields that contain internal email addresses. Apply this custom PII semantic to all fields in all data sources that contain internal email addresses.

Consistently tagging fields with internal email addresses with the email-internal custom PII semantic will create a column named email_internal in the Unified Coalesced table. The Merged Customers table can use this column to extend the merge rules to include rules for internal email addresses.

Important

Do not apply the default email semantic to the email-internal columns or add the email-internal semantic to any merge rules that handle the presence of namespaced or ordinal custom email semantics.

Assign field priority

Assigning field priority to a custom PII semantic is optional and should only be done when a custom PII semantic is tagged in multiple data sources, requires a priority other than the default priority (“999”), and requires these data sources to be assigned different priorities.

To extend merge rules to assign field priority

  1. From the Customer 360 page, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the Merged Customers table.

  3. Find the Field_Priority section and update it to add the email_internal column to the field priority list:

    ,Field_Priority AS (
      SELECT *
      FROM VALUES
        ("Domain:Table", null, null, null, null, null, null)
        ,("Domain:Table", null, null, null, null, null, null)
      AS (fp_datasource, name_pri, address_pri, email_pri, phone_pri, birthdate_pri, gender_pri)
    )
    
  4. Find the Unified_Structs sections, and then add a named struct for the email_internal column:

    ,NAMED_STRUCT(
      'email_internal', email_internal
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.email_internal_pri, up.priority, 999)
      ,'completion', INT(ISNOTNULL(email_internal))
    ) AS `email_internal_struct`
    
  5. Find the Unified_Prioritized section, and then add a first value block for the email_internal column:

    ,FIRST(email_internal_struct)
      OVER (
        PARTITION BY amperity_id
        ORDER BY email_internal_struct.completion DESC
                 ,email_internal_struct.priority
                 ,email_internal_struct.update_dt DESC
                 ,email_internal_struct.pk
      ) AS email_internal_struct
    
  6. Find the SELECT statement that builds the Merged Customers table, and then add the columns for email_internal:

    ,up.email_internal_struct.email_internal
    ,up.email_internal_struct.pk AS `email_internal_pk`
    ,up.email_internal_struct.update_dt AS `email_internal_update_dt`
    ,up.email_internal_struct.datasource AS `email_internal_datasource`
    ,up.email_internal_struct.priority AS `email_internal_priority`
    ,up.email_internal_struct.completion AS `email_internal_completion`
    
  7. Validate the query.

  8. Click Next. This opens the Database Table Definition page.

  9. Verify these settings, and then click Save.

  10. Run the customer 360 database.

Skip field priority

Field priority is only necessary when a custom PII semantic is tagged in multiple data sources, requires a priority other than the default priority (“999”), and requires these data sources to be assigned different priorities. Any custom semantic that does not meet this criteria should be assigned a default field priority and should not be added to the columns in the field priority list.

To extend merge rules to skip field priority

  1. From the Customer 360 page, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged Customers table and select Edit.

  3. Find the Unified_Structs sections, and then add a named struct for the email_internal column:

    ,NAMED_STRUCT(
      'email_internal', email_internal
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(up.priority, 999)
      ,'completion', INT(ISNOTNULL(email_internal))
    ) AS `email_internal_struct`
    
  4. Find the Unified_Prioritized section, and then add a first value block for the email_internal column:

    ,FIRST(email_internal_struct)
      OVER (
        PARTITION BY amperity_id
        ORDER BY email_internal_struct.completion DESC
                 ,email_internal_struct.priority
                 ,email_internal_struct.update_dt DESC
                 ,email_internal_struct.pk
      ) AS email_internal_struct
    
  5. Find the SELECT statement that builds the Merged Customers table, and then add the columns for email_internal:

    ,up.email_internal_struct.email_internal
    ,up.email_internal_struct.pk AS `email_internal_pk`
    ,up.email_internal_struct.update_dt AS `email_internal_update_dt`
    ,up.email_internal_struct.datasource AS `email_internal_datasource`
    ,up.email_internal_struct.priority AS `email_internal_priority`
    ,up.email_internal_struct.completion AS `email_internal_completion`
    
  6. Validate the query.

  7. Click Next. This opens the Database Table Definition page.

  8. Verify these settings, and then click Save.

  9. Run the customer 360 database.

Analyze PII by datasource

You can measure how much data a single feed contributes to the larger set of data by using datasource columns in the Merged Customers table. These columns are associated with the original primary key and updated dates and allow queries to be built in the customer 360 database to analyze the effects of one data source on the larger set of data in the customer 360 database.

The Merged Customers table is already configured to provide this data. For example:

SELECT
  up.amperity_id
  ,up.address_struct.address
  ,up.address_struct.address2
  ,up.address_struct.city
  ,up.address_struct.state
  ,up.address_struct.postal
  ,up.address_struct.country
  ,up.address_struct.pk AS `address_pk`
  ,up.address_struct.update_dt AS `address_update_dt`
  ,up.address_struct.datasource AS `address_datasource`
  ,up.address_struct.priority AS `address_priority`
  ,up.address_struct.completion AS `address_completion`

If this is missing, you must add the following line to each grouping, updated correctly for each semantic:

,up.address_struct.datasource AS `address_datasource`:

Bad-values blocklist

A bad-values blocklist contains known values that appear frequently in data and should be excluded from the Stitch process.

When using the bad-values blocklist, you must update the Merged Customers table include the bad-values blocklist items in the merge rules.

Concatenate full names

The following example shows how to extend the Merged Customers table to concatenate full names based on the given-name and surname semantic tags.

To concatenate full names

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged Customers table and select Edit.

  3. Find the Unified_Preprocessed section and add the highlighted line:

    ,Unified_Preprocessed AS (
      SELECT *
      ,CONCAT_WS(' ', given_name, surname) AS full_name_p
      FROM Unified_Coalesced uc
      LEFT JOIN Source_Priority sp ON sp.sp_datasource = uc.datasource
    )
    
  4. Find the NAMED_STRUCT with the name column, and then update it to use the coalesced column full_name_p. This must be done in the following locations:

    ,NAMED_STRUCT(
      'given_name', given_name
      ,'surname', surname
      ,'full_name', full_name_p
      ,'generational_suffix', generational_suffix
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.name_pri, up.priority, 999)
      ,'completion', INT((ISNOTNULL(given_name)
                     AND ISNOTNULL(surname))
                     OR ISNOTNULL(full_name_p))
    
    ) AS `name_struct`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.

Multiple email addresses

The following example shows how to extend the Merged Customers table to support email addresses that are tagged with namespaced custom email semantic tags: email-home and email-work. These tags are applied in addition to the default email semantic. These tags will create two columns in the Unified Coalesced table: email_home and email_work.

To support custom email semantics

  1. From the Customer 360 page, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged Customers table and select Edit.

  3. Find the Unified_Preprocessed section and add the highlighted line:

    ,Unified_Preprocessed AS (
      SELECT *
      ,COALESCE(email_home, email_work) AS email_p
      FROM Unified_Coalesced uc
      LEFT JOIN Source_Priority sp ON sp.sp_datasource = uc.datasource
    )
    

    Replace email_home and email_work with the correct names for the custom semantic tags pattern you are using for your tenant.

  4. Find the NAMED_STRUCT with the email column, and then update it to use the coalesced column email_p. This must be done in the following locations:

    ,NAMED_STRUCT(
      'email', email_p
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.email_pri, up.priority, 999)
      ,'completion', INT(ISNOTNULL(email_p))
    ) AS `email_struct`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.

Multiple phone numbers

The following example shows how to extend the Merged Customers table to support multiple phone numbers that are tagged with ordinal custom semantic tags: phone-1 and phone-2. These tags are applied in addition to the default phone semantic. These tags will create two columns in the Unified Coalesced table: phone_1 and phone_2.

To support custom phone semantics

  1. From the Customer 360 page, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged Customers table and select Edit.

  3. Find the Unified_Preprocessed section and add the highlighted line:

    ,Unified_Preprocessed AS (
      SELECT *
      ,COALESCE(phone_1, phone_2) AS phone_p
      FROM Unified_Coalesced uc
      LEFT JOIN Source_Priority sp ON sp.sp_datasource = uc.datasource
    )
    

    Replace phone_1 and phone_2 with the correct names for the custom semantic tags pattern you are using for your tenant.

  4. Find the NAMED_STRUCT with the phone column, and then update it to use the coalesced column phone_p. This must be done in the following locations:

    ,NAMED_STRUCT(
      'phone', phone_p
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.phone_pri, up.priority, 999)
      ,'completion', INT(ISNOTNULL(phone_p))
    ) AS `phone_struct`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.