Review Domain Data

A domain table is created for each data feed that has been configured to ingest a data source into Amperity. The data in a domain table is processed to include semantic tagging alongside the original fields from the data feed. Domain tables that contain customer records are made available to the Stitch process to identify unique individuals and assign them Amperity IDs. Domain tables that contain interaction records are used to create attributes that are associated with the unique individuals who have been assigned Amperity IDs.

Review domain data

The Domain Tables section of the Sources tab lists the domain tables that have been generated by feeds. Each row shows the name of the domain table, the type of record contained within that domain table (customer or interaction), and the number of columns.

Amperity-added columns

Amperity adds the following columns to all domain tables. The added columns start with underscores (_) and are used by Amperity during Stitch processing.

  1. The _pk column is an identifier that is generated based on the all of the columns in the feed that were associated to the primary key.

  2. The _uuid_pk column contains a system-generated UUID. This UUID helps Amperity distribute work during Stitch processing.

  3. The _updated column contains details about the last update and is system-generated.

These columns will be available in the customer 360 database when a domain table is configured as a passthrough table.

Column types

Columns in domain tables may be one of the following types:

Type

Description

Boolean

A value that represents an either/or, such as true or false, 0 or 1, true or NULL.

date

An ISO-8601 compliant date values, such as a birthdate. For example:

  • 2021-11-04

datetime

An ISO-8601 compliant date and time values=, such as a purchase or transaction, the time at which data was last updated, or a campaign launch date. For example:

  • Mon Nov 30 2020 16:00:00 GMT-0800 (Pacific Standard Time)

  • Sat Sep 02 2017 14:36:19 GMT-0700 (Pacific Daylight Time)

Important

Some fields that store datetime values are set to the string data type.

decimal

A fixed point number, such as for prices or message sizes. (The number of characters in the decimal value is configurable during feed setup.). For example:

  • 1.50 (prices)

  • 1874.380 (message sizes)

  • 141.48042 (order subtotals)

float

A floating point number. (Use decimal for prices.) For example:

  • 3.14

  • 3.14159

integer

A numeric value, such as the quantity of items purchased. Do not use data type for prices. For example:

  • 1

  • 12345

string

A sequence of characters, such as first and last names, email addresses, physical addresses, UUIDs (and other IDs), phone numbers, zip codes, product names, descriptions, and so on. May be empty. For example:

  • John

  • Smith

  • John Smith

  • johnsmith @ domain.com

  • 123 Main Street

  • 206-555-1111

  • 00002ac0-0915-3cb4-b7c7-5ee192b3bd49

  • ACME

  • pants

  • “A data source that pulls from an Amazon S3 bucket.”

Record types

Two important types of data are present in domain tables:

Note

A domain table may contain both customer and interaction records. As part of the Stitch process, customer records and interaction records are split into dedicated tables for use within the customer 360 database. A domain table may contain other types of data.

Customer records

A customer record is a row in a customer data table that contains information (columns) about the customer. Who they are, where they live, and how much they spend. For example, a email list table contains names, email addresses, phone numbers, and so on. All domain tables that contain customer records are stitched together to create Amperity IDs.

Customer records are defined by the presence of data that can be assigned semantic tags for customer profile data, specifically to fields that contain personally identifiable information (PII).

Interaction records

An interaction record is a row in a customer data table that contains information about customer behavior, such as purchases (items bought, items returned, costs of items, etc.) and preferences (brands, products, cart adds, etc.).

Interaction records are defined by the presence of data that can be assigned semantic tags for transactions, itemized transactions, product catalogs, and other behavior data, such as semantic tags for loyalty programs.

Interaction records often require data to be reshaped using domain SQL (and custom domain tables) to ensure that the right combination of fields are present in the data to support components and workflows within AmpIQ, including functionality that is available from the Metrics tab, the Segments tab, predicted modeling, and the Campaigns tab.

Reshape domain data

Domain SQL is a reference to using Spark SQL to reshape data that has been loaded to Amperity prior to making it available to downstream processes, such as Stitch or the customer 360 database. Domain SQL is often used to build new tables from existing domain tables, and then reshaping that data into a new table that allows semantic tags for transactions and itemized transactions to be applied correctly.

Note

A custom domain table should be designed to be as static as possible. A change in the source data will often require a change to the SQL that defines the custom domain table, which may require a complete reload of data for that data source.

About domain SQL

Domain SQL is a reference to using Spark SQL to reshape data that has been loaded to Amperity prior to making it available to downstream processes, such as Stitch or the customer 360 database. Domain SQL is often used to build new tables from existing domain tables, and then reshaping that data into a new table that allows semantic tags for transactions and itemized transactions to be applied correctly.

Use Spark SQL to reshape domain tables into custom domain tables. A reshaped domain table acts like a feed, in that you can apply semantic tags, identify a primary key, make the table available to Stitch, and so on.

Examples

The following examples show using Spark SQL to reshape domain data:

Build birthdates

If incoming data contains birthdate data split by day, month, and year, you can build a complete birthdate using an ingest query. For example, incoming data has the following fields:

----- ------- ------
 day   month  year
----- ------- ------
 08    12     1969
 11    25     1978
 09    15     1981
----- ------- ------

The following example uses the IF() function to concatenate three fields together using a forward slash (/) as a separator:

SELECT
  *
  ,IF(birth_month != '0' AND birth_day != '0' AND  birth_year != '0',
      birth_month||'/'||birth_day||'/'||birth_year, NULL) AS birthdate
FROM Source_Table

Combine postal codes

Some data sources do not contain fields for complete postal codes and instead contain fields that separate the five- and four-digit codes. Some use cases require a single field for postal codes that includes both components, after which the postal semantic tag is applied.

The following example shows how to use a CASE statement to do the following:

  1. Find situations where the five- and four-digit codes are both present, and then combine them.

  2. Find situations where only the five-digit code is present, and then use only the five-digit code.

  3. Uses the CONCAT_WS function to return zip_code and zip_code_plus_four separated by -.

  4. Use NULL for situations where the five-digit code is not present.

  5. Return as the postal field, to which the postal semantic tag may be applied.

,CASE
  WHEN zip_code != '(NULL)' AND zip_code_plus_four != '(NULL)'
  THEN CONCAT_WS('-',zip_code, zip_code_plus_four)

  WHEN zip_code != '(NULL)'
  THEN zip_code

  ELSE NULL
END AS postal

Extract names

Some data sources do not contain fields that can be directly assigned the given-name and surname semantic tags. These tags are important to downstream Stitch processes. When a field is present in the data source that contains data that can be tagged with the full-name semantic tag, you can use domain SQL to extract the first and last name details from that field, add them as new columns, and then apply the correct semantic tags.

Use the REGEXP_EXTRACT() function to:

  1. Trim whitespace from before (or after) the first and last names.

  2. Individually extract the first and last names from the field that contains the full name.

  3. Add columns for the first and last names.

The following example shows part of a SELECT statement that extracts first and last names from the BILLING_NAME field, and then adds columns for first and last names:

,REGEXP_EXTRACT(TRIM(BILLING_NAME),'(^\\S*)',1) AS GIVEN_NAME
,REGEXP_EXTRACT(TRIM(BILLING_NAME),'((?<=\\s).*)',1) AS SURNAME
,TRIM(BILLING_NAME) AS `BILLING_NAME`

Hash profile data

Some segments send results downstream to support CCPA and/or GDPR workflows. Some CCPA and GDPR workflows send this data back to Amperity, which typically requires the data to be hashed using a domain table.

For example, to hash the name, email, and phone fields in a table named tohash_ccpa:

SELECT
  *
  ,SHA2(UPPER(TRIM(firstname))) AS Hash_firstname
  ,SHA2(UPPER(TRIM(lastname))) AS Hash_lastname
  ,SHA2(UPPER(TRIM(email))) AS Hash_email
  ,SHA2(UPPER(TRIM(phone_number))) AS Hash_phone
FROM tohash_ccpa

Parse multiple separators

Sometimes incoming data will contain data that should be tagged with more than one semantic tag, but also contain different separators within the incoming field. For example:

----------- ---------- ------------------- ------- ---------------------- -------
 firstName   lastName   street              poBox   location               zip
----------- ---------- ------------------- ------- ---------------------- -------
 John        Smith      123 Main            #101    US - Yelm , WA         98597
 Andy        Jones      456 South Avenue            US - Bellingham, WA    98115
 Anne        Andersen   999 S. Bergen Way           US - Seattle ,WA       98104
----------- ---------- ------------------- ------- ---------------------- -------

where location represents country, city, and state, always separated with a dash (-) between the country and city, and then a comma (,) between the city and the state. Some fields contain extra white space between and around the strings.

The location field needs to be split into individual city, state, and country fields, the two delimiters can be removed, along with the extra whitespace.

Use domain SQL similar to the following:

SELECT
  location
  ,TRIM(SPLIT(location, '-')[0]) AS country
  ,TRIM(SPLIT(SPLIT(location, '-')[1],',')[0]) AS city
  ,TRIM(SPLIT(location, ',')[1]) AS state
FROM domain_table

and then tag the city, state, and country fields with the appropriate semantic tags.

Set non-USA email to NULL

The following CASE statement decodes customer emails, identifies customer emails that are not encoded using the US-ASCII character set, and then sets them to NULL.

CASE
  WHEN UPPER(DECODE(UNBASE64(customer_email),'US-ASCII')) = 'UNDEFINED'
  THEN NULL
  ELSE UPPER(DECODE(UNBASE64(customer_email),'US-ASCII'))
END AS email,

Standardize phone numbers

The following example uses the TRIM() function to find empty or NULL phone numbers, sets them to NULL, and then standardizes all phone numbers to add a +1:

CASE
  WHEN TRIM(phone) LIKE ''
  OR TRIM(phone) IS NULL
  THEN NULL
  ELSE CONCAT('+1','',REGEXP_REPLACE(phone,'[^0123456789]',''))
END AS phone

Standardize USA states

The following example standardizes values for all fifty states in the United States to only a two-character value (AK, AL, AR, etc.). The CASE statement uses the following strings to determine:

  1. The correct two-character value

  2. The correct spelled out value

  3. Other variations that appear in the data, which may be common (or known) abbreviations, misspellings, slang, or shortcuts

CASE
  WHEN UPPER(TRIM(COALESCE(state))) IN ('AL','ALABAMA', 'BAMA') THEN 'AL'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('AK','ALASKA') THEN 'AK'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('AZ','ARIZONA') THEN 'AZ'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('AR','ARKANSAS') THEN 'AR'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('CA','CALIF','CALIFORNIA','CALIFORNIZ','CALIFRONIA') THEN 'CA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('CO','COLORADO') THEN 'CO'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('CT','CONNECTICUT', 'CONNETICUT') THEN 'CT'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('DE','DELAWARE', 'DELWARE') THEN 'DE'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('FL','FLORIDA') THEN 'FL'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('GA','GEORGIA') THEN 'GA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('HI','HAWAII', 'HAWAI\'I') THEN 'HI'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('ID','IDAHO') THEN 'ID'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('IL','ILLINOIS') THEN 'IL'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('IN','INDIANA') THEN 'IN'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('IA','IOWA') THEN 'IA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('KS','KANSAS') THEN 'KS'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('KY','KENTUCKY') THEN 'KY'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('LA','LOUISIANA', 'LOUSIANA') THEN 'LA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('ME','MAINE') THEN 'ME'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MD','MARYLAND') THEN 'MD'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MA','MASS','MASSACHUSETES','MASSACHUSETTS','MASSACHUSETTES') THEN 'MA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MI','MICHIGAN') THEN 'MI'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MN','MINNESOTA') THEN 'MN'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MS','MISSISSIPPI') THEN 'MS'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MO','MISSOURI') THEN 'MO'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('MT','MONTANA') THEN 'MT'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NE','NEBRASKA') THEN 'NE'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NV','NEVADA') THEN 'NV'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NH','NEW HAMPSHIRE') THEN 'NH'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NJ','NEW JERSEY', 'JERSEY') THEN 'NJ'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NM','NEW MEXICO') THEN 'NM'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NY','NEW YORK') THEN 'NY'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('NC','NORTH CAROLINA') THEN 'NC'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('ND','NORTH DAKOTA') THEN 'ND'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('OH','OHIO') THEN 'OH'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('OK','OKLAHOMA') THEN 'OK'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('OR','ORE','OREGON','OREGONE') THEN 'OR'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('PA','PENNSYLVANIA') THEN 'PA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('RI','RHODE ISLAND') THEN 'RI'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('SC','SOUTH CAROLINA') THEN 'SC'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('SD','SOUTH DAKOTA') THEN 'SD'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('TN','TENNESSEE') THEN 'TN'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('TX','TEXAS') THEN 'TX'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('UT','UTAH') THEN 'UT'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('VT','VERMONT') THEN 'VT'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('VA','VIRGINIA') THEN 'VA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('WA','WASHINGTON') THEN 'WA'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('WV','WEST VIRGINIA') THEN 'WV'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('WI','WISCONSIN') THEN 'WI'
  WHEN UPPER(TRIM(COALESCE(state))) IN ('WY','WYOMING') THEN 'WY'
ELSE NULL