Merged_Customers Table

The Merged_Customers table is a data table that must be added to a customer 360 database. The purpose of the Merged_Customers table is to collect rows from the Unified_Coalesced table, and then collapse those into a single row per Amperity ID. Columns in the Merged_Customers table are made available to segmentation via the Customer_360 table.

Important

The Merged_Customers table was previously known as the Unified_Merged table. This change was made to support improvements for using interactions records within Amperity. This change is primarily a naming change; many tenants continue to use the Unified_Merged naming pattern. When you see Merged_Customers in the documentation, it also refers to Unified_Merged. You should use Merged_Customers as the naming pattern for any tenant in which a Unified_Merged table does not already exist.

Tip

This topic describes a recommended starting point for the Merged_Customers table.

The Merged_Customers table collects PII data from the Unified_Coalesced table, and then performs additional processing and grouping of this data prior to making this data available to the Customer_360 table.

Use the Merged_Customers tables to apply bad-values blocklists to merge rules.

The Merged_Customers table is added to the customer 360 database when you create that database using the database template. You may build the Merged_Customers table by hand using the Merged_Customers template .

This topic describes the starting point for the Merged_Customers table, and then steps through the process of updating it to be specific to your tenant. (This topic does not attempt to address all of the specific use cases you may have for your tenant.)

Common ways of extending this table to support additional use cases are described, along with providing links to more detailed examples, when available. Start with the SQL that is added to your tenant by the “Customer 360” database template, and then add support for all of your required use cases.

Requirements

This topic assumes the following requirements are met:

  • PII semantic tags are applied consistently, including assigning the correct data types, to all feeds that contain customer records. For custom PII semantics and/or non-PII data that must be accessible from the Unified_Coalesced table, you may need to extend the Merged_Customers table to support them.

  • Feeds that contain customer records are made available to Stitch.

  • Each feed has a primary key; foreign keys are applied consistently across all feeds that are made available to Stitch.

  • At least one feed must contain a field that indicates when the data was last updated.

    Apply the update-dt semantic tag to columns in customer records that identify when the data was last updated. The field to which this semantic is applied must be a datetime field type. At least one customer record must have this semantic tag applied to ensure that the update_dt column is created in the Unified_Coalesced table and to ensure that the Merged_Customers table behaves correctly.

  • When multiple columns are present that could be tagged with the phone or email semantic tags, determine which approach to use for applying custom phone semantic tags and custom email semantic tags, and then extend the Merged_Customers table to support custom email or custom phone semantics.

  • You can extend the Merged_Customers table to exclude common or “bad” values. Extensions may be done after the Merged_Customers table is created because they often have both downstream and upstream dependencies within the Amperity workflow that will require some tuning within for the Merged_Customers table.

Add table

The Merged_Customers table is a data table that must be added to a customer 360 database. The purpose of the Merged_Customers table is to collect rows from the Unified_Coalesced table, and then collapse those into a single row per Amperity ID. Columns in the Merged_Customers table are made available to segmentation via the Customer_360 table.

Note

The Merged_Customers table uses a Spark SQL query to pull the Amperity ID, along with all PII data, from the Unified_Coalesced table.

The Unified_Coalesced table contains every row from every stitched table that is associated with an Amperity ID, with defined semantics coalesced into a single column. A unique Amperity ID may appear in more than one row. Columns are added to this table when semantic values match values in the bad-values blocklist.

Important

This table is added automatically when the “Customer 360” template is used to add the customer 360 database. Using a template is the recommended way to add the Merged_Customers table. This section documents how to manually add this table, should it be necessary.

To add the Merged_Customers table

  1. From the Customer 360 tab, under All Databases, select the menu for the customer 360 database, and then click Edit.

  2. From the Database Editor, click Add Table.

  3. Name the table “Merged_Customers”.

  4. Set the build mode to SQL.

  5. Click Apply template, and then select Merged_Customers.

  6. Update the placeholder names (shown as “Domain:Table” in the query) and set field-level priorities.

  7. Click Validate to verify the SQL runs without error.

  8. Click Next. This opens the Database Table Definition page.

  9. Add a table description. This enables a tooltip that is visible from other areas in Amperity.

  10. Verify that the db/required and db/unique database field semantics were applied to the amperity_id column.

  11. Verify that semantic tags–given_name, surname, email, phone, address, city, state, postal, birthdate, gender, etc.–were applied to all PII fields correctly.

    Tip

    You can clear incorrectly tagged semantics individually or for the entire table. To clear semantic tags for the entire table, under Semantic Suggestions click Clear semantics.

  12. From the Table Semantics drop-down, select Merged_Customers.

  13. Under Version History, select Enable table version history.

  14. Click Save.

SQL query

The following SQL query is the recommended starting point for the Merged_Customers table. It exists in two parts: a window function that collects and groups PII data, and then a statement that updates the Merged_Customers table with the results. This query does the following:

  1. Provides a location in which domain tables (i.e. “feeds that are made available to Stitch”) are assigned priority. (This section must be updated to contain the names of the domain tables for your tenant that should be assigned a non-default priority.)

  2. Provides a location in which PII fields are assigned priority. (This section must be updated to contain the names of the domain tables that are assigned source priority, and then be updated to assign priorities to logical groups of fields that contain PII data, e.g. email addresses, physical addresses, and so on.)

  3. Left joins the fields in the Unified_Coalesced table into a temporary table for prioritization.

  4. Left joins the fields in the Unified_Coalesced table to form groupings of PII fields. For example, all address-related fields (address, address2, city, state, postal, and country) are grouped together. This is done for names, addresses, phone numbers, email addresses, birthdates, and gender.

  5. Left joins the fields in the Unified_Coalesced table by logical groups and by assigned priority.

  6. Adds a set of fields to the temporary table for each of the logical groups, which are then added to the Merged_Customers table.

  7. Refreshes the data in the Merged_Customers table.

Refer to the required updates and the example of updates for more information about how to apply the recommended query to the specific domain tables in your tenant.

Required updates

The recommended starting point for the Merged_Customers table requires the following updates:

  • Update the list of domain tables under Source_Priority to contain at least one domain table with fields tagged for PII semantics.

  • Update the list of domain tables under Field_Priority to contain the same list of domain tables, and then specify individual field priorities across all tables. The order of the fields under each Domain:Table must match the order of the fields under fp_datasource.

Source priority

Update the list of domain tables under Source_Priority to contain at least one domain table with fields tagged for PII semantics. Only domain tables that are made available to Stitch may be listed in the source priority. An integer value will assign priority, where 1 has a higher priority than 2. Domain tables may be assigned the same priority. Any domain table not specified here is assigned a default priority of 999.

WITH
  Source_Priority AS (
    SELECT *
    FROM VALUES
      ("Domain:Table",  x)
      ,("Domain:Table", x)
    AS (sp_datasource, priority)
  )

Field priority

Use the field priority table to assign priorities for individual fields that are different from the priority assigned generally to each source domain table Source_Priority. Update the list of tables to match the same domain tables that were specified for Source_Priority, and then update the field priority table to specify individual priorities.

  • A NULL value will use the priority assigned to the source domain table or, when a priority is not assigned by a source domain table, the default priority of 999.

  • An integer value will assign priority, where 1 has a higher priority than 2.

,Field_Priority AS (
  SELECT *
  FROM VALUES
    ("Domain:Table", null, null, null, null, null, null)
    ,("Domain:Table", null, null, null, null, null, null)
  AS (fp_datasource, name_pri, address_pri, email_pri, phone_pri, birthdate_pri, gender_pri)

Caution

The order of the fields under each Domain:Table matches the order of the fields under fp_datasource. Be sure to apply the priority in the correct location. For example, assigning field priority for email addresses:

,null      ,null         ,1          ,null       ,null           ,null

,null      ,null         ,2          ,null       ,null           ,null

,name_pri  ,address_pri  ,email_pri  ,phone_pri  ,birthdate_pri  ,gender_pri

Example of updates

The following example shows three domain tables: Table A, Table B, and Table C, with source priority assigned in the same order. The email address in Table B is assigned the highest priority, the address and phone in Table C is assigned the highest priority, and the name is assigned the highest priority in all three tables.

The database in this tenant also includes three other tables: Table D, Table E, and Table F. These tables contain fields tagged with PII semantics, but none of them are higher priority than Table A, Table B, or Table C. By not adding Table D, Table E, and Table F to the SQL query for the Merged_Customers table, these tables will be automatically assigned the default priority of 999.

WITH
  Source_Priority AS (
    SELECT *
    FROM VALUES
      ("Table:A",  1)
      ,("Table:B", 2)
      ,("Table:C", 3)
    AS (sp_datasource, priority)
  )

  ,Field_Priority AS (
    SELECT *
    FROM VALUES
      ("Table:A",1,2,2,2,null,null)
      ,("Table:B",1,2,1,2,null,null)
      ,("Table:C",1,1,2,1,null,null)
    AS (fp_datasource, name_pri, address_pri, email_pri, phone_pri, birthdate_pri, gender_pri)
  )

Extend PII semantics

You can extend the Merged_Customers table to do some preprocessing to fields that contain PII semantics to improve the quality of the data that ends up in the Customer_360 table:

Add custom semantics

You can create custom semantics for PII and non-PII fields. All custom semantics are added to the Unified_Coalesced table.

  1. Apply the custom semantics to to feeds. Ensure each custom semantic uses a consistent data type across all feeds.

  2. Extend the Merged_Customers table with custom merge rules that load the column created by the custom semantic from the Unified_Coalesced table.

  3. Apply similar merge patterns used for PII semantics.

Within the Merged_Customers table, you must decide if the custom semantic requires source and/or field priority:

For example, a company may have data sources that provide data about its own employees, such as internal email addresses, phone numbers, and so on. Use a custom PII semantic named email-internal to tag the fields that contain internal email addresses. Apply this custom PII semantic to all fields in all data sources that contain internal email addresses.

Consistently tagging fields with internal email addresses with the email-internal custom PII semantic will create a column named email_internal in the Unified_Coalesced table. The Merged_Customers table can use this column to extend the merge rules to include rules for internal email addresses.

Important

Do not apply the default email semantic to the email-internal columns or add the email-internal semantic to any merge rules that handle the presence of namespaced or ordinal custom email semantics.

Assign field priority

Assigning field priority to a custom PII semantic is optional and should only be done when a custom PII semantic is tagged in multiple data sources, requires a priority other than the default priority (999), and requires these data sources to be assigned different priorities.

To extend merge rules to assign field priority

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the Merged_Customers table.

  3. Find the Field_Priority section and update it to add the email_internal column to the field priority list:

    ,Field_Priority AS (
      SELECT *
      FROM VALUES
        ("Domain:Table", null, null, null, null, null, null)
        ,("Domain:Table", null, null, null, null, null, null)
      AS (fp_datasource, name_pri, address_pri, email_pri, phone_pri, birthdate_pri, gender_pri)
    )
    
  4. Find the Unified_Structs sections, and then add a named struct for the email_internal column:

    ,NAMED_STRUCT(
      'email_internal', email_internal
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.email_internal_pri, up.priority, 999)
      ,'completion', INT(ISNOTNULL(email_internal))
    ) AS `email_internal_struct`
    
  5. Find the Unified_Prioritized section, and then add a first value block for the email_internal column:

    ,FIRST_VALUE(email_internal_struct)
      OVER (
        PARTITION BY amperity_id
        ORDER BY email_internal_struct.completion DESC
                 ,email_internal_struct.priority
                 ,email_internal_struct.update_dt DESC
                 ,email_internal_struct.pk
      ) AS email_internal_struct
    
  6. Find the SELECT statement that builds the Merged_Customers table, and then add the columns for email_internal:

    ,up.email_internal_struct.email_internal
    ,up.email_internal_struct.pk AS `email_internal_pk`
    ,up.email_internal_struct.update_dt AS `email_internal_update_dt`
    ,up.email_internal_struct.datasource AS `email_internal_datasource`
    ,up.email_internal_struct.priority AS `email_internal_priority`
    ,up.email_internal_struct.completion AS `email_internal_completion`
    
  7. Validate the query.

  8. Click Next. This opens the Database Table Definition page.

  9. Verify these settings, and then click Save.

  10. Run the customer 360 database.

Skip field priority

Field priority is only necessary when a custom PII semantic is tagged in multiple data sources, requires a priority other than the default priority (999), and requires these data sources to be assigned different priorities. Any custom semantic that does not meet this criteria should be assigned a default field priority and should not be added to the columns in the field priority list.

To extend merge rules to skip field priority

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged_Customers table and select Edit.

  3. Find the Unified_Structs sections, and then add a named struct for the email_internal column:

    ,NAMED_STRUCT(
      'email_internal', email_internal
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(up.priority, 999)
      ,'completion', INT(ISNOTNULL(email_internal))
    ) AS `email_internal_struct`
    
  4. Find the Unified_Prioritized section, and then add a first value block for the email_internal column:

    ,FIRST_VALUE(email_internal_struct)
      OVER (
        PARTITION BY amperity_id
        ORDER BY email_internal_struct.completion DESC
                 ,email_internal_struct.priority
                 ,email_internal_struct.update_dt DESC
                 ,email_internal_struct.pk
      ) AS email_internal_struct
    
  5. Find the SELECT statement that builds the Merged_Customers table, and then add the columns for email_internal:

    ,up.email_internal_struct.email_internal
    ,up.email_internal_struct.pk AS `email_internal_pk`
    ,up.email_internal_struct.update_dt AS `email_internal_update_dt`
    ,up.email_internal_struct.datasource AS `email_internal_datasource`
    ,up.email_internal_struct.priority AS `email_internal_priority`
    ,up.email_internal_struct.completion AS `email_internal_completion`
    
  6. Validate the query.

  7. Click Next. This opens the Database Table Definition page.

  8. Verify these settings, and then click Save.

  9. Run the customer 360 database.

Analyze PII by datasource

You can measure how much data a single feed contributes to the larger set of data by using datasource columns in the Merged_Customers table. These columns are associated with the original primary key and updated dates and allow queries to be built in the customer 360 database to analyze the effects of one data source on the larger set of data in the customer 360 database.

The Merged_Customers table is already configured to provide this data. For example:

SELECT
  up.amperity_id
  ,up.address_struct.address
  ,up.address_struct.address2
  ,up.address_struct.city
  ,up.address_struct.state
  ,up.address_struct.postal
  ,up.address_struct.country
  ,up.address_struct.pk AS `address_pk`
  ,up.address_struct.update_dt AS `address_update_dt`
  ,up.address_struct.datasource AS `address_datasource`
  ,up.address_struct.priority AS `address_priority`
  ,up.address_struct.completion AS `address_completion`

If this is missing, you must add the following line to each grouping, updated correctly for each semantic:

,up.address_struct.datasource AS `address_datasource`:

Bad-values blocklist

A bad-values blocklist contains known values that appear frequently in data and should be excluded from the Stitch process.

Important

The bad-values blocklist supports the following PII semantics: given-name, surname, email, phone, and address (which is handled as a combination of fields to which the address, address2, city, state, and postal semantic tags are applied). You do not have to implement all of these semantics as part of the query template for automated blocklist values. Automated blocklist values will not work for address2 (standalone), birthdate, city (standalone), company, full-name, gender, generational-suffix, postal (standalone), state (standalone), title, or for any custom semantic.

When using the bad-values blocklist, you must update the Merged_Customers table include the bad-values blocklist items in the merge rules.

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged_Customers table and select Edit.

  3. Find the Unified_Prioritized section, and then update the first value blocks for email and phone to add the .blv entries:

    ,FIRST_VALUE(email_struct)
      OVER (
        PARTITION BY amperity_id
        ORDER BY email_struct.completion DESC
                 ,email_struct.blv
                 ,email_struct.priority
                 ,email_struct.update_dt DESC
                 ,email_struct.pk
      ) AS email_struct
    
    ,FIRST_VALUE(phone_struct)
      OVER (
        PARTITION BY amperity_id
        ORDER BY phone_struct.completion DESC
                 ,phone_struct.blv
                 ,phone_struct.priority
                 ,phone_struct.update_dt DESC
                 ,phone_struct.pk
      ) AS phone_struct
    
  4. Find the SELECT statement that builds the Merged_Customers table, and then add the columns for email_internal:

    ,up.email_struct.email
    ,up.email_struct.pk AS `email_pk`
    ,up.email_struct.update_dt AS `email_update_dt`
    ,up.email_struct.datasource AS `email_datasource`
    ,up.email_struct.priority AS `email_priority`
    ,up.email_struct.blv AS `email_blv`
    ,up.email_struct.completion AS `email_completion`
    
    ,up.phone_struct.phone
    ,up.phone_struct.pk AS `phone_pk`
    ,up.phone_struct.update_dt AS `phone_update_dt`
    ,up.phone_struct.datasource AS `phone_datasource`
    ,up.phone_struct.priority AS `phone_priority`
    ,up.phone_struct.blv AS `phone_blv`
    ,up.phone_struct.completion AS `phone_completion`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.

Note

If you add other PII semantics to the bad-values blocklist, such as for addresses, be sure to make the same changes to the structs.

Concatenate full names

The following example shows how to extend the Merged_Customers table to concatenate full names based on the given-name and surname semantic tags.

To concatenate full names

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged_Customers table and select Edit.

  3. Find the Unified_Preprocessed section and add the highlighted line:

    ,Unified_Preprocessed AS (
      SELECT *
      ,CONCAT_WS(' ', given_name, surname) AS full_name_p
      FROM Unified_Coalesced uc
      LEFT JOIN Source_Priority sp ON sp.sp_datasource = uc.datasource
    )
    
  4. Find the NAMED_STRUCT with the name column, and then update it to use the coalesced column full_name_p. This must be done in the following locations:

    ,NAMED_STRUCT(
      'given_name', given_name
      ,'surname', surname
      ,'full_name', full_name_p
      ,'generational_suffix', generational_suffix
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.name_pri, up.priority, 999)
      ,'completion', INT((ISNOTNULL(given_name)
                     AND ISNOTNULL(surname))
                     OR ISNOTNULL(full_name_p))
    
    ) AS `name_struct`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.

Multiple email addresses

The following example shows how to extend the Merged_Customers table to support email addresses that are tagged with namespaced custom email semantic tags: email-home and email-work. These tags are applied in addition to the default email semantic. These tags will create two columns in the Unified_Coalesced table: email_home and email_work.

To support custom email semantics

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged_Customers table and select Edit.

  3. Find the Unified_Preprocessed section and add the highlighted line:

    ,Unified_Preprocessed AS (
      SELECT *
      ,COALESCE(email_home, email_work) AS email_p
      FROM Unified_Coalesced uc
      LEFT JOIN Source_Priority sp ON sp.sp_datasource = uc.datasource
    )
    

    Replace email_home and email_work with the correct names for the custom semantic tags pattern you are using for your tenant.

  4. Find the NAMED_STRUCT with the email column, and then update it to use the coalesced column email_p. This must be done in the following locations:

    ,NAMED_STRUCT(
      'email', email_p
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.email_pri, up.priority, 999)
      ,'completion', INT(ISNOTNULL(email_p))
    ) AS `email_struct`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.

Multiple phone numbers

The following example shows how to extend the Merged_Customers table to support multiple phone numbers that are tagged with ordinal custom semantic tags: phone-1 and phone-2. These tags are applied in addition to the default phone semantic. These tags will create two columns in the Unified_Coalesced table: phone_1 and phone_2.

To support custom phone semantics

  1. From the Customer 360 tab, under All Databases, open the menu for the customer 360 database, and then select Edit.

  2. Open the menu for the Merged_Customers table and select Edit.

  3. Find the Unified_Preprocessed section and add the highlighted line:

    ,Unified_Preprocessed AS (
      SELECT *
      ,COALESCE(phone_1, phone_2) AS phone_p
      FROM Unified_Coalesced uc
      LEFT JOIN Source_Priority sp ON sp.sp_datasource = uc.datasource
    )
    

    Replace phone_1 and phone_2 with the correct names for the custom semantic tags pattern you are using for your tenant.

  4. Find the NAMED_STRUCT with the phone column, and then update it to use the coalesced column phone_p. This must be done in the following locations:

    ,NAMED_STRUCT(
      'phone', phone_p
      ,'pk', pk
      ,'update_dt', update_dt
      ,'datasource', datasource
      ,'priority', COALESCE(fp.phone_pri, up.priority, 999)
      ,'completion', INT(ISNOTNULL(phone_p))
    ) AS `phone_struct`
    
  5. Validate the query.

  6. Click Next. This opens the Database Table Definition page.

  7. Verify these settings, and then click Save.

  8. Run the customer 360 database.