Configure feeds¶
A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.
Set the primary key¶
A primary key is a column in a data table that uniquely identifies each row in a data source or data table.
At least one field must be set as a primary key. Any feed that contains customer records or interaction records must have a field that can be associated with a primary key. This is typically an obvious field, like a customer ID or transaction ID, but some data sources are not as clear. You may tag more than one field as the primary key.
To set the primary key
From the Sources page, open the menu for a feed, and then select Edit. The Feed Editor page opens.
From the Primary Key drop-down, select a field from the list.
Click Activate.
Tip
The number of records in a domain table may not match the number of records loaded by Amperity after loading data. Amperity uses an UPSERT process when loading data and determines priority based on the Last Updated Field. If a large difference exists take a close look at the primary key and determine if the primary key is the cause.
Apply semantic tags¶
Semantic tags must be defined for every feed that will provide profile data to Stitch. This ensures that data from rich sources of profile data are brought into Amperity in a consistent manner, which improves the outcome of the Stitch process.
Each group of semantic tags–customer profile semantic tags, interaction semantic tags, and key semantic tags–allow for a range of options.
Customer profiles¶
Personally identifiable information (PII) is any data that could potentially identify a specific individual. PII data includes details like names, addresses, email addresses, and other profile attributes, but can also include attributes like a loyalty number, customer relationship management (CRM) system identifiers, and foreign keys in customer data.
A PII semantic assigns consistency to customer data to ensure that PII data is more easily discovered across many sets of data.
Profile semantics should be applied to customer records that contain three (or more) good sources of PII data. Profile semantics should be applied to interaction records only when customer records are stored alongside transaction details and when there are three (or more) good sources of PII data.
The following table lists the tags available to this semantic group:
Semantic name |
Datatype |
Description |
---|---|---|
address |
String |
The address that is associated with the location of a customer, such as “123 Main Street”. |
address2 |
String |
Additional address information, such as an apartment number or a post office box, that is associated with the location of a customer, such as “Apt #9”. |
birthdate |
Date |
The date of birth that is associated with a customer. Tip A field that is tagged with the birthdate semantic tag will return an error when the feed is saved and the data type is not set to Date. |
city |
String |
The city that is associated with the location of a customer. |
company |
String |
The company, typically an employer or small business, that is associated with a customer. |
country |
String |
The country that is associated with the location of a customer. Important A field to which the country semantic tag is applied is added to the Unified Coalesced table, but is otherwise ignored by Stitch. |
create-dt |
Apply the create-dt semantic tag to columns in customer records that identify when the data was created. The field to which this semantic is applied must be a datetime field type. |
|
String |
The email address that is associated with a customer. A customer may have more than one email address. |
|
full-name |
String |
A combination of given name (first name) and surname (last name) for a customer. May include a middle name or initial. |
gender |
String |
The gender that is associated with a customer. Supported values for fields associated with the gender semantic tag include:
|
generational-suffix |
String |
The suffix that identifies to which family generation a customer record belongs. For example: Jr., Sr. II, and III. Caution The generational-suffix semantic tag should only be applied once per feed and only to a field that contains the suffix separated from the first and last names. |
given-name |
String |
The first name that is associated with a customer. Caution The given-name semantic tag may only be applied once per feed. |
phone |
String |
The phone number that is associated with a customer. A customer may have more than one phone number. Tip A field that is tagged with the phone semantic tag will return an error when the feed is saved and the data type is not set to String. Important Amperity uses the last 10-digits of phone numbers for identity resolution. Use the input validation report to find data sources that contain records with phone numbers that exceed 10 digits. You should exclude extensions from phone numbers whenever possible. You may use a custom domain table to apply additional formatting to phone numbers, such as removing extensions. Alternately provide data sources to Amperity that have already removed the extensions or have moved them into a different field. |
postal |
String |
The zip code or postal code that is associated with the location of a customer. A full 9-digit zip code is derived from fields that contain zip code data. Tip A field that is tagged with the postal semantic tag will return an error when the feed is saved and the data type is not set to String. |
state |
String |
The state or province that is associated with the location of a customer. |
surname |
String |
The last name that is associated with a customer. Caution The surname semantic tag may only be applied once per feed. |
title |
String |
The title that precedes a full name that is associated with a customer, such as “Mr.”, “Mrs”, and “Dr”. |
update-dt |
Apply the update-dt semantic tag to columns in customer records that identify when the data was last updated in the source system. The field to which this semantic is applied must be a datetime field type. At least one customer record must have this semantic tag applied to ensure that the update_dt column is created in the Unified Coalesced table and to ensure that the Merged Customers table behaves correctly. |
Keys¶
Keys are used to identify signals in source data that can be applied during the Stitch process. For example, a table that contains customer records automatically assigns the pk semantic to any field identified as a primary key. For tables that contain interaction records, a foreign key is often used to associate important fields for interaction records to primary keys for customer records. This allows interaction records to be correlated with the Amperity ID as an outcome of the Stitch process even though interaction records are (typically) not processed by Stitch for the purpose of identity resolution.
The following table describes the keys that are used to tag data:
Semantic Name |
Datatype |
Icon |
Description |
---|---|---|---|
bk-[label] |
String |
A blocking key defines a specific combination of characters to be used as a blocking strategy. For example, the first three characters in given-name, the first character in surname, and birthdate represent a blocking key. Tip A foreign key can be labeled as a blocking key to force Stitch to score all records that match on the blocking key, but without assigning them a 5.0 score. |
|
ck |
String |
The ck semantic tag may be applied to a column that contains pre-existing, tenant-specific customer IDs. When customer keys are applied, Amperity compares them to the Amperity ID as part of the deduplication process. Tip What happens to customer keys in the Unified Coalesced table?
|
|
fk-[namespace] |
String |
A foreign key is a column in a data table that acts as primary key and can be used for deterministic matching of records. A record pair is assigned an exact match score (5.0) when foreign keys contain identical values during pairwise comparison. The fk-[namespace] semantic tag identifies a field as a foreign key. A foreign key semantic tag must be namespaced. For example: fk-customer, fk-interaction, fk-audience, or fk-brand. A foreign key semantic tag may be applied to any column in any data source, but should be associated with a field that can also act as a primary key for that data source and is present in other tables. A foreign key may be used once within a table. A table may have more than one foreign key. For example, if a data source contains customer and audience identifiers, apply fk-customer to the customer identifier and fk-audience to the audience identifier. Tip What happens to foreign keys in the Unified Coalesced table?
Note If foreign keys are linked together by a trivial duplicate they will appear in the Unified Preprocessed Raw table as a comma-separated list. |
|
pk |
String |
A primary key is a column in a data table that uniquely identifies each row in a data source or data table. The combination of data source and primary key allows Amperity to uniquely identify every row in every data table across the entirety of customer data input to Amperity. Tip What happens to primary keys in the Unified Coalesced table?
|
|
sk |
String |
A separation key (sk) is used for deterministic unmatching of records. The sk-[semantic] semantic tag is a namespaced key that matches a customer profile semantic tag and is applied to a field that contains matching customer profile data. For example: sk-birthdate matches birthdate and sk-surname matches surname. Important Amperity derives separation keys for sk-given-name and sk-generational-suffix automatically. You may apply more than one separation key within a table; however, each unique separation key may only be applied once. All separation key semantic tags must be namespaced to match the profile semantic for the same field. Important A separation key may also be tagged as a foreign key. Tagging the same field as a foreign and separation key can be useful when customer data has a strong identifier that is also associated with an important profile semantic tag, such as phone or email. |
Product catalogs¶
Product catalog semantics may be applied to data sources that contain product catalog data. Product semantics may applied alongside other semantics, depending on the data source. Use the built-in list of semantics when building a feed. Product catalog semantics are prefixed with pc/ in the semantics drop-down menu in the Feed Editor. Use the combination of product semantic tags that best describes the structure of your product catalog.
Important
The Unified Product Catalog table represents the taxonomy for your products and brands. Attributes are added to the Unified Product Catalog table when pc/ semantic tags are applied to your data sources. All pc/ semantic tags are optional. Use the ones that best define the shape of your product catalog and best describe the individual items within it. The product ID is used as an input to predictive modeling.
The following table lists the tags available to this semantic group (with required semantic tags noted by “ Required.” and recommended semantic tags noted by “ Recommended”):
Semantic name |
Datatype |
Description |
---|---|---|
product-brand |
String |
Optional. The brand name of a product or item. |
product-brand-id |
String |
Optional. The ID for the brand name of a product or item. |
product-category |
String |
Recommended A category to which a product belongs. Use this semantic tag to identify how a customer categorizes individual products within their product catalog. |
product-category-id |
String |
Optional. The ID for the category to which a product belongs. |
product-class |
String |
Optional. The name of the class (or grouping) to which a product or item belongs. |
product-class-id |
String |
Optional. The ID for the name of the class (or grouping) to which a product or item belongs. |
product-collection |
String |
Optional. The name of the collection to which a product or item belongs. |
product-collection-id |
String |
Optional. The ID for the name of the collection to which a product or item belongs. |
product-color |
String |
Optional. The color of a product or item. |
product-color-id |
String |
Optional. The ID for the color of a product or item. |
product-department |
String |
Optional. The department to which a product or item belongs. |
product-department-id |
String |
Optional. The ID for the department to which a product or item belongs. |
product-description |
String |
Recommended A description of the product. |
product-division |
String |
Optional. The division to which a product or item belongs. |
product-division-id |
String |
Optional. The ID for the division to which a product or item belongs. |
product-fabric |
String |
Optional. The fabric used for a product or item. |
product-fabric-id |
String |
Optional. The ID for the fabric used for a product or item. |
product-gender |
String |
Recommended Apply this as a custom semantic tag to a fields that contain a list of gender options for products. For example: F, M, unisex, NULL (for unknown). |
product-group |
String |
Optional. The group to which a product or item belongs. |
product-id |
String |
Optional The unique identifier for a product. Important Predictive modeling requires a product catalog to contain between 20-2000 unique product IDs. A product ID is often associated with a stock keeping unit (SKU). A stock keeping unit (SKU) is an identifier that captures all of the unique details of any individual product, including specific attributes that differentiate by color, size, material, and so on. For example, a shirt with the same color and material, but with three different sizes would be represented by three unique SKUs and would also be represented by three unique product IDs. Each customer has their own definition for product IDs and SKUs. Be sure to understand this definition before applying semantic tags to fields with product IDs to ensure they accurately reflect the customer’s definition and meet the requirements for predictive modeling (if enabled). |
product-material |
String |
Optional. The material used for a product or item. |
product-material-id |
String |
Optional. The ID for the material used for a product or item. |
product-msrp |
String |
Optional. The manufacturer’s suggested retail price (MSRP) for a product or item. The manufacturer’s suggested retail price (MSRP) is the price before shipping costs, taxes, and/or discounts have been applied. MSRP is sometimes referred to as the base price. |
product-name |
String |
Optional. The name of the product or item. |
product-season |
String |
Optional. The season to which a product or item is associated. |
product-season-id |
String |
Optional. The ID for the season to which a product or item is associated. |
product-silhouette |
String |
Optional. |
product-size |
String |
Optional. The size of a product or item. |
product-size-id |
String |
Optional. The ID for the size of a product or item. |
product-sku |
String |
Optional. The stock keeping unit, or SKU, for the product or item. A stock keeping unit (SKU) is an identifier that captures all of the unique details of any individual product, including specific attributes that differentiate by color, size, material, and so on. |
product-style |
String |
Optional. The style of a product or item. |
product-subcategory |
String |
Recommended The subcategory or secondary variant to which a product belongs. |
product-subcategory-id |
String |
Optional. The ID for the subcategory or secondary variant to which a product belongs. |
product-subclass |
String |
Optional. The subclass to which a product or item is assigned. |
product-subclass-id |
String |
Optional. The ID for the subclass to which a product or item is assigned. |
product-subdepartment |
String |
Optional. The sub-department to which a product or item is assigned. |
product-subdepartment-id |
String |
Optional. The ID for the sub-department to which a product or item is assigned. |
product-type |
String |
Optional. The type assigned to a product or item. |
product-upc |
String |
Optional. The UPC code for the product or item. A Universal Product Code (UPC or UPC code) is a barcode that is widely used to track items in stores. |
Transactions¶
An itemized transactions semantic is a way to identify brands, channels, stores, orders, products, quantities, per-item costs, total costs, and so on. Use itemized transactions semantics when a data source contains one row per item.
Itemized transaction semantics should be applied to data sources that contain records for individual items in a transaction. Itemized transaction semantics may applied alongside other semantics, depending on the data source. Use the built-in list of semantics when building a feed.
Itemized transaction semantics are prefixed with txn-item/ in the semantics drop-down menu in the Feed Editor.
Important
This collection of semantic tags is used by Amperity to build the Unified Itemized Transactions table. Each semantic tag is directly associated with a column in that table. For example, values identified by the is-cancellation, item-cost, and order-id semantic tags are added to the is_cancellation, item_cost, and order_id columns, respectively.
The Unified Itemized Transactions table contains rows of transactional data summarized to the item level, and then coalesced into a single column for each unique combination of order ID and product ID. The order ID is associated with an Amperity ID.
Carefully review the data in the Unified Itemized Transactions table, including column values that are calculated from values in other columns in this table or the Unified Transactions table, to verify their accuracy and to ensure that associated semantic tags have been applied correctly.
The following table lists the tags available to this semantic group (with required semantic tags noted by “ Required.” and recommended semantic tags noted by “ Recommended”):
Semantic name |
Datatype |
Description |
---|---|---|
[custom-semantic] |
String |
Required Use a foreign key (recommended) or a custom semantic tag (such as Important See fk-[namespace]. At least one field must have the [custom-semantic] or fk-[namespace] semantic tags applied to it to support downstream processing requirements for interaction records. You may apply more than one, or use a combination, of these semantic tags. When a custom semantic tag is added to itemized transactions data it:
|
currency |
String |
Optional Currency represents the type of currency that was used to pay for an item. For example: dollar. Note Currency must be consistent across all orders from the same data source. |
digital-channel |
String |
Optional The digital channel through which a transaction was made. For example: Facebook, Google Ads, email, etc. Note This semantic tag should only be used when purchase-channel specifies an online channel. |
fk-[namespace] |
String |
Required The fk-[namespace] semantic tag identifies a field as a foreign key. A foreign key semantic tag must be namespaced. For example: fk-customer, fk-interaction, fk-audience, or fk-brand. A namespaced foreign key must be present in interaction records that contain transactions data. A foreign key may used along with a customer ID. Important See [custom-semantic]. At least one field must have the fk-[namespace] or [custom-semantic] semantic tags applied to it to support downstream processing requirements for interaction records. You may apply more than one, or use a combination, of these semantic tags. When a foreign key is added to transactions data it:
|
is-cancellation |
Boolean |
Required A flag that indicates if the item was canceled. Important The field to which the is-cancellation semantic is applied must represent a value that is Note The is-cancellation and is-return semantic tags may not be applied to the same field. |
is-return |
Boolean |
Required A flag that indicates if the item was returned. Important The field to which the is-return semantic is applied must represent a value that is Note The is-cancellation and is-return semantic tags may not be applied to the same field. |
item-cost |
Decimal |
Optional Item cost is the cost to produce all units of an item. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
item-discount-amount |
Decimal |
Optional Item discount amount is the discount amount that is applied to all units that are associated with a single item within a single transaction. This value should equal item quantity multiplied by unit discount amounts. This value is used by Amperity for discount sensitivity analysis. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
item-discount-percent |
Decimal |
Optional Item discount percent is the percentage discount that is applied to all units that are associated with a single item within a single transaction. This value is used by Amperity for discount sensitivity analysis. Note This value must be between 0 and 1. |
item-list-price |
Decimal |
Optional Item list price is the manufacturer’s suggested retail price (MSRP) for all units of this item. The manufacturer’s suggested retail price (MSRP) is the price before shipping costs, taxes, and/or discounts have been applied. MSRP is sometimes referred to as the base price. This value should equal item revenue plus item discount amount. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
item-profit |
Decimal |
Optional Item profit represents the amount of profit that is earned when all units of an item are sold. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
item-quantity |
Integer |
Required Item quantity is the total number of items in an order. When an item has been returned or an order has been canceled, item quantity is the total number of items that were returned and/or canceled. Note This value must be less than or equal to 0 when is-return or is-cancellation are true. |
item-revenue |
Decimal |
Required The total revenue for all units of an item, after discounts are applied. When an item has been returned or the order has been canceled, the total revenue for all items that were returned and/or canceled. This value should equal item list price minus item discount amount. Note This value must be less than or equal to 0 when is-return or is-cancellation are true. |
item-subtotal |
Decimal |
Optional An item subtotal is the amount for an item, before discounts are applied. This value should equal unit list price times item quantity. This value is used by Amperity to calculate discounts for discount sensitivity analysis. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
item-tax-amount |
Decimal |
Optional An item tax amount is the total amount of taxes that are associated with the purchase of an item. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
order-datetime |
Datetime |
Required Order datetime is the date (and time) on which an order was placed. The order date:
Note Other dates associated with an order that are not specific to a transactions, such as dates associated with hotel stays and reservations, should be added to the Unified Product Catalog table. |
order-discount-amount |
Decimal |
Required Order discount amount is the total discount amount that is applied to the entire order. This tag provides the following data:
Note This field appears as a positive for a purchase and a 0 for a cancellation or return for minimum, maximum, and total order discount amounts in the Unified Transactions table. |
order-id |
String |
Required An order ID is the unique identifier for the order and links together all of the items that were part of the same transaction. When an item has been returned or when an order has been canceled, the order ID is the unique identifier for the original order, including the returned or canceled items. Note The order ID should never change, even when an item in the order is returned or canceled. Important If order IDs are recycled and/or are otherwise not guaranteed to be unique over time, the unique identifier for the order must be updated to be a combination of the order ID and the date on which the order occurred. This must be done using domain SQL similar to: |
payment-method |
String |
Optional A payment method is how a customer chose to pay for the items they have purchased. For example: credit card, gift card, or cash. |
Product catalogs |
String |
Optional Product catalog semantics may be applied to data sources that contain product catalog data. There are two sets of product catalog semantic tags: txn-item/ and pc/.
Important The names of the semantic tags that are available for product catalogs are identical. For example: “product-brand”, “product-category”, and “product-gender”. The difference is the prefix that you choose to use and the pattern your tenant chooses for defining your product catalog within Amperity. You should determine which pattern you want to use early in your configuration and deployment process. Talk with your Amperity representative if you have questions about the best ways to approach this within your tenant. To review the descriptions for all of the product catalog semantic tags you may prefix with txn-item/ refer to the section in this topic about product catalog semantic tags. |
product-id |
String |
Required The unique identifier for a product. A stock keeping unit (SKU) is an identifier that captures all of the unique details of any individual product, including specific attributes that differentiate by color, size, material, and so on. For example, a shirt with the same color and material, but with three different sizes would be represented by three unique SKUs and would also be represented by three unique product IDs. For data that contains itemized transactions, where a single transaction includes more than one of the same product, the product ID must appear only once per order ID in the Unified Itemized Transactions table. Multiple instances of the same product must be added to the item quantity in the same row. Caution Every customer has their own definition for SKUs and product IDs. Be sure to understand this definition before applying semantic tags to fields with product IDs to ensure they accurately reflect the customer’s definition. |
purchase-brand |
String |
Required The brand for which a transaction was made. Caution This semantic tag should only be used when interaction records contain transaction data for more than one brand. |
purchase-channel |
String |
Required A purchase channel is the channel from which a transaction was made. For example: in-store or online. |
store-id |
String |
Required A store ID is a unique identifier that is identified with the location of a store. |
unit-cost |
Decimal |
Optional Unit cost is the cost to produce a single unit of one item. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
unit-discount-amount |
Decimal |
Optional Unit discount amount is the discount amount that is applied to a single unit of one item. This discount is often applied to all units of the same item within a single transaction. This value is used by Amperity for discount sensitivity analysis. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
unit-list-price |
Decimal |
Optional Unit list price is the manufacturer’s suggested retail price (MSRP) for a single unit of an item. The manufacturer’s suggested retail price (MSRP) is the price before shipping costs, taxes, and/or discounts have been applied. MSRP is sometimes referred to as the base price. This value should equal the unit discount amount plus the unit subtotal. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
unit-profit |
Decimal |
Optional Unit profit represents the amount of profit that is earned when a single unit of an item is sold. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
unit-revenue |
Decimal |
Optional The total revenue for a single unit of an item. When an item has been returned or the order has been canceled, the total revenue for a single unit of an item that was returned and/or canceled. Note This value must be less than or equal to 0 when is-return or is-cancellation are true. |
unit-subtotal |
Decimal |
Optional A unit subtotal is the amount for a single unit of one item, before discounts have been applied. This value is used by Amperity to calculate discounts for discount sensitivity analysis. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
unit-tax-amount |
Decimal |
Optional A unit tax amount is the total amount of taxes that are associated with a single unit. Note This value must be greater than or equal to 0 for purchases, but less than or equal to 0 for returns or cancellations. |
Select last updated field¶
Amperity requires each feed to specify a field that describes when each record was last updated. If multiple records in the incoming data and/or the existing domain table have the same primary key, the record with the most recent “last updated” field will be retained. This may be associated with a field that has a datetime field type, or an integer (such as for unix timestamps).
Note
Amperity does not use a field with a date data type because that value is not granular enough to determine priority.
If you have no such updated field, you can choose to autogenerate a field, in which case the following logic is used to determine which record to keep in the case a primary key appears more than once:
Records from newly-ingested data will always overwrite records that already exist in the domain table.
If couriers are run over a date range, records from files associated with later dates will be retained.
If multiple files are loaded for the same date, records for the latest-loaded file are retained. File loading order depends on the behavior of the source system, but is generally deterministic.
If the same primary key appears on multiple records on the same text-based file, the latest row on the file is retained.
Note
When using ingest queries, the above tiebreakers are unavailable, so upserting behavior can be nondeterministic. Ensure that you either specify a “last updated” field, or that your ingest query only returns a single record for each primary key, to ensure deterministic results.
To set the last updated field
From the Sources page, open the menu for a feed, and then select Edit. The Feed Editor page opens.
The last updated field is above the field list in the center of the page.
Under Last Updated Field, choose how Amperity will determine priority: automatically generated, a field with an integer data type, or a field with a datetime data type (often the same field to which the update-dt merge rules semantic tag is applied).
Click Activate
Tip
The number of records in a domain table may not match the number of records loaded by Amperity after loading data. Amperity uses an UPSERT process when loading data and determines priority based on the Last Updated Field. If a large difference exists take a close look at the primary key and determine if the primary key is the cause.
Make available to Stitch¶
A domain table with semantic tags applied to records that contain PII data should be made available to Stitch. A domain table that is made available to Stitch is used by Stitch for customer identity resolution.
Domain table data is made available to Stitch in two steps:
Selecting the Make available to Stitch option when configuring a feed or a custom domain table.
When selected, the name of the domain table that is associated with the feed or custom domain table is added to a list of domain tables that are accessible as a Stitch configuration setting.
A list of domain tables within Stitch configuration that have been made available to Stitch. Each table in this list must be configured for Stitch before it can be part of the identity resolution process.
Tip
Only tables that contain PII data should be made available to Stitch. Tables that are later associated with Amperity IDs, but do not contain PII data, such as those that contain transactions, should use a foreign key to associate those records with an Amperity ID.
To make data available to Stitch
From the Sources page, open the menu for a feed, and then select Edit. The Feed Editor page opens.
Under Domain Table select Make available to Stitch.
Click Activate.