Unified_Coalesced table

The Unified_Coalesced table contains all the PII data that has been processed through Stitch. The data is organized by the semantic tag as the column heading and then coalesced into one single table. A unique Amperity ID may appear in more than one row.

The Unified_Coalesced table is added to databases in the Customer 360 tab when:

  1. The Stitch QA database is added using the “Stitch QA” database template.

  2. A customer 360 database is added using the “Customer 360” database template.

  3. A table is added to a custom database using the Unified_Coalesced table as a passthrough table.

Use with Stitch QA

Use the Unified_Coalesced table as part of the Stitch QA process to:

  • Ensure semantic tags are selected and visible to the Queries page

  • Investigate relationships between records, especially when records appear to be assigned (not assigned) to an incorrect (correct) Amperity ID, which defines a single cluster of records

  • Investigate blocklisted values

Tip

The Unified_Coalesced and Unified_Preprocessed_Raw tables are frequently used together during Stitch QA because many investigation patterns will compare the values in one table to the values in the other.

Profile semantics and foreign keys

Use the Customer 360 page to verify that rows with profile semantic tags and foreign keys are selected to be available to the Queries page. Has the list of profile and/or foreign key semantics changed since the last time you have performed Stitch QA? This will occur when a data source is added that requires a new foreign key or in a situation where one of the less frequently used profile semantic tags is applied to a new data source.

Open the table in the Database Explorer and verify that all rows in the table that are associated with a semantic tag or a foreign key have a checkmark in the left column. Rows without a checkmark will not make the associated FIELD available to the Queries page. If rows do not have a checkmark, edit the table and apply the checkmark, save the table, activate the Stitch QA database, and then run the database to refresh the table.

Relationships between records

Use the Queries page to investigate unexpected relationships between individual records that share the same Amperity ID and to investigate why records that look like they should belong to the same cluster of records were split.

Use a combination of the Unified_Coalesced and Unified_Preprocessed_Raw tables to review the data that is associated with these records to help understand why groups of records were matched or why certain records were assigned to a cluster.

In some cases, you may need to investigate and compare entire clusters to better understand why (or why not) these groups of records were maintained (or split) during hierarchical comparison, a step in the Stitch process that occurs after pairwise comparison and scoring.

Global blocklist values

The Unified_Coalesced table provides additional data about individual records, such as blocklisted values, component IDs, or if a record belongs to a supersized cluster.

In some cases, groups of records may have values removed by the global bad-values blocklist instead of the tenant-specific bad-values blocklist.

Compare records in the Unified_Coalesced table to records in the Unified_Preprocessed_Raw table to identify which values were removed.

Column reference

The Unified_Coalesced table contains the following columns:

Column Name

Data type

Description

amperity_id

String

The unique identifier that is assigned to clusters of customer records that all represent the same individual. The Amperity ID does not replace primary and foreign keys, but exists alongside them within unified profiles.

Note

The Amperity ID is a universally unique identifier (UUID) that is represented by 36 characters spread across five groups separated by hyphens: 8-4-4-4-12.

For example:

123e4567-e89b-12d3-a456-426614174000

address

String

The address that is associated with the location of an individual customer record. For example: 123 Main Street.

Values in this column depend on fields that are tagged with the address semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

address2

String

Additional address information, such as an apartment number or a post office box, that is associated with the location of an individual customer record. For example: Apt #9.

Values in this column depend on fields that are tagged with the address2 semantic.

Also in: Merged_Customers, Unified_Customer

birthdate

Date

The date of birth that is associated with an individual customer record.

Values in this column depend on fields that are tagged with the birthdate semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

blv_address

Boolean

When true, the address on this customer record matches a blocklist value.

blv_email

Boolean

When true, the email on this customer record matches a blocklist value.

blv_given_name

Boolean

When true, the given-name on this customer record matches a blocklist value.

blv_phone

Boolean

When true, the phone on this customer record matches a blocklist value.

blv_surname

Boolean

When true, the surname on this customer record matches a blocklist value.

city

String

The city that is associated with the location of an individual customer record.

Values in this column depend on fields that are tagged with the city semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

ck

String

The ck semantic tag may be applied to a column that contains pre-existing, tenant-specific customer IDs. When customer keys are applied, Amperity compares them to the Amperity ID as part of the deduplication process.

Tip

What happens to customer keys in the Unified_Coalesced table?

  • Records may have NULL customer keys.

  • There may be only one customer key per data source.

  • There may be multiple customer keys per Amperity ID. This is because customer keys may also be tagged as foreign keys.

Also in: Unified_Customer

component_id

Integer

An identifier that represents a set of records that are transitively connected with a score above threshold as an outcome of blocking and initial scoring. Records that share a component ID, but have different Amperity IDs, were split during hierarchical comparison.

Tip

Records with different component_id values may show as having been blocked together. This can occur when a connecting record pair was removed because it scored below the pairwise comparison threshold.

Also in: Unified_Preprocessed_Raw

country

String

The country that is associated with the location of an individual customer record.

Values in this column depend on fields that are tagged with the country semantic.

Important

The country field is added to the Unified_Coalesced table when fields are tagged with the country profile semantic.

Also in: Merged_Customers, Unified_Customer

create_dt

String

Apply the create-dt semantic tag to columns in customer records that identify when the data was created. The field to which this semantic is applied must be a datetime field type.

Also in: Merged_Customers, Unified_Customer

datasource

String

The name of the data source from which this customer record originated.

Tip

The combination of pk and datasource uniquely identifies a row in the Unified_Coalesced table, which correlates to a single row in a domain table.

email

String

The email address that is associated with an individual customer record. A customer record may be associated with multiple email addresses.

Values in this column depend on fields that are tagged with the email semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

fk_[name]

String

The fk-[namespace] semantic tag identifies a field as a foreign key. A foreign key semantic tag must be namespaced. For example: fk-customer, fk-interaction, fk-audience, or fk-brand.

A column is added for each foreign key that is defined in the Sources tab.

Tip

What happens to foreign keys in the Unified_Coalesced table?

  • Records may have NULL foreign keys.

  • There may be multiple foreign keys in the data source, but there may not be duplicate foreign keys.

  • There may be multiple foreign keys per Amperity ID.

  • There should not be multiple Amperity IDs per foreign key.

Also in: Unified_Customer, Unified_Preprocessed_Raw

full_name

String

A combination of given name (first name) and surname (last name) that is associated with an individual customer record and is stored as a combined value in a single field within customer data. A full name may include a middle name or initial.

Values in this column depend on fields that are tagged with the full-name semantic.

Also in: Merged_Customers, Unified_Customer

gender

String

The gender that is associated with an individual customer record.

Values in this column depend on fields that are tagged with the gender semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

generational_suffix

String

The suffix that identifies to which family generation a customer record belongs. For example: Jr., Sr. II, and III.

Also in: Merged_Customers, Unified_Customer

given_name

String

The first name that is associated with an individual customer record.

Values in this column depend on fields that are tagged with the given-name semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

has_blv

Boolean

The has_blv column indicates if blocklist values for address, email, phone, given-name, or surname are present in customer records.

is_supersized

Boolean

Indicates when a rough heuristic is applied to the first grouping of records (rep_pk) to partition supersized records into smaller components. Supersized records occur when more than 500 groups are associated with the first grouping of records.

loyalty_id

String

The identifier for a loyalty program that is associated with an individual customer record.

Also in: Merged_Customers

phone

String

The phone number that is associated with an individual customer record. A customer record may be associated with multiple phone numbers.

Values in this column depend on fields that are tagged with the phone semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

pk

String

The combination of data source and primary key allows Amperity to uniquely identify every row in every data table across the entirety of customer data input to Amperity.

Tip

The combination of pk and datasource uniquely identifies a row in the Unified_Coalesced table, which correlates to a single row in a domain table.

postal

String

The zip code or postal code that is associated with the location of an individual customer record.

Values in this column depend on fields that are tagged with the postal semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

rep_ds

Integer

The rep_ds column shows the datasource that is associated with the rep_pk column.

rep_pk

Integer

The rep_pk column is an identifier that represents the first grouping of records done by Stitch. This grouping is based on identical semantic patterns.

Tip

The combination of rep_ds and rep_pk represent qualified trivial duplications, which are records with enough identical PII to score 3.0 (or greater) and were grouped together by Stitch early in the identity resolution process.

All qualified trivial duplicates are treated as a single record by downstream Stitch processes.

The rep_ds and rep_pk fields are included in the Unified_Coalesced table to help with situations where it’s necessary to understand why two records were not clustered together.

state

String

The state or province that is associated with the location of an individual customer record.

Values in this column depend on fields that are tagged with the state semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

supersized_id

Integer

An identifier that represents supersized records that were partitioned into smaller components.

Also in: Unified_Preprocessed_Raw

surname

String

The last name that is associated with an individual customer record.

Values in this column depend on fields that are tagged with the surname semantic.

Also in: Customer_360, Merged_Customers, Unified_Customer, Unified_Preprocessed_Raw

title

String

The title that precedes a full name that is associated with an individual customer record. For example: Mr., Mrs, and Dr.

Also in: Merged_Customers, Unified_Customer

update_dt

String

Apply the update-dt semantic tag to columns in customer records that identify when the data was last updated. The field to which this semantic is applied must be a datetime field type. At least one customer record must have this semantic tag applied to ensure that the update_dt column is created in the Unified_Coalesced table and to ensure that the Merged_Customers table behaves correctly.

Also in: Merged_Customers, Unified_Customer