About AmpID

AmpID resolves customer identities across all of your customer records by applying proprietary machine learning algorithms.

Every system has its own way of identifying customers. The longer a person interacts with your brand, the more fragmented their identity becomes. This leads to inaccurate insights, misattributed segments, and personalization that isn’t personal. AmpID correctly identifies each of your unique individual customers, allowing you to drive the customer experiences you want.

  • AmpID is a multi-patented process that runs on a daily basis, unifying records that other approaches to identity resolution routinely over- and under-connect.

  • AmpID can do this across massive amounts of data sources, including from online and offline transactions, clickstream, loyalty, email, and more.

  • AmpID assigns a unique identifier to each unique individual: the Amperity ID. This ID is a stable, universal identifier that spans loyalty programs, email, transactions and all other systems.

  • The Amperity ID remains stable over time, even as new data is provided to Amperity as customers engage and provide new PII.

Identity resolution

Identity (ID) resolution is the process of connecting and matching different data points across multiple devices and channels to form a unified view of a single customer, allowing brands to connect the dots between fragmented data to form a complete picture of an actual person.

The goal of identity resolution is to identify the same individual within and across all data sources that contain customer information.

Identity resolution.

A complete view of your customers must combine each individual’s transactions from multiple sources, including point-of-sale, e-commerce, email interactions, loyalty programs, and mobile app engagement. It must include historical data as well as current data that’s produced as customers interact with your brand.

Transitive connections

A transitive connection exists between individual records when any two records share a strong match to an intermediate record, but do not have a strong match to each other. For example: record 1 matches record 2, record 3 matches record 2, neither records 1 or 3 match to each other, but they have a transitive connection because both match record 2.

Amperity identifies transitive connections using a patented clustering algorithm that eliminates record-pairs with genuine conflicts while preserving record-pairs that do not match, but also do not have genuine conflicts.

The following example shows transitive connections within a cluster graph as it might appear from the Cluster Graph tab in the Data Explorer. Although the PII for this customer’s transactions from a store’s point of sale (A and B) do not match online purchases (C, D, G, and F), they have a transitive connection from the shared relationships with the loyalty program (H).

Transitive connections.

Common workflows

The most common workflows for AmpID focus on interacting with ID graphs for the purpose of associating the Amperity ID to your customer data. AmpID uses AI-powered probabilistic identity resolution to assign the Amperity ID to all customers in the data, which enables the following workflows:

  • First-party customer ID graphs

  • Householding customer ID graphs

  • Third-party customer ID graphs

  • Exploring Stitch results across all of your customer data

Use SQL segment editors to query ID graphs and QA data for the purpose of:

  • Linking the Amperity ID (first-party) to third-party provider person IDs

  • Building a true count of a brand’s customers

  • Exporting ID graphs to downstream workflows

  • Using data hygiene to verify accuracy of PII with third-party data providers

  • Standardizing data for certain PII details

  • Linking unknown IDs to known customers

  • Mapping anonymous users to known customers

About Stitch

Stitch uses patented algorithms to evaluate massive volumes of data to discover the hidden connections in your customer records that identify unique individuals. Stitch outputs a unified collection of data that assigns a unique identifier to each unique individual that is discovered within your customer records.

The Stitch tab shows detailed results of the Stitch process, which takes customer data, exctracts customer records, and then compares record pairs using over 40 different machine learning models. Each record pair is given a score, which represents the strength of the match. Amperity creates clusters of records based on the connection between pairs, and then gives each cluster a unique Amperity ID.

The Stitch tab in Amperity.

Explore Amperity IDs

An Amperity ID is a patented unique identifier that is assigned to clusters of customer records. A single Amperity ID represents a single individual. Unlike other systems, the Amperity ID is reassessed every day for the most comprehensive view of your customers.

As new data is input to Amperity, the Stitch process identifies when new or changed data applies to existing clusters of customer records, and then updates those records, maintains the cluster, and retaining a stable Amperity ID assignment. A new Amperity ID is only created when new individuals are identified.

Explore data sources

The Stitched Sources section of the Stitch tab shows a comparison of domain tables and the record pairs identified both within each data source and across all data sources. This is presented as an UpSet Plot chart with links to the underlying data sources via the Data Explorer.

The following diagram shows the components of the UpSet plot chart, inclusive of the distribution of Amperity IDs across all data sources, and then for each data source, an individual breakdown of how that data source compares to all other data sources. (An UpSet plot chart will have a row for each data source. This diagram shows the first two only.)

An UpSet plot chart, located within the Stitch tab in Amperity.

Each individual stitched data source can be explored from the UpSet plot. The UpSet plot includes a source-by-source breakdown of stitched data. For each record, a View source link is available. This opens the Data Explorer and displays a Schema for the data source that shows the name of the field as it is defined in customer data, the data type, the Amperity semantic applied to the field, and sample data. A Sample shows 100 records from that data source, where each of the fields defined in the customer data source are presented as columns of data.

Explore semantics

A semantic is a way to apply a common understanding to individual points of data across multiple data sources, even when data sources have different schemas, naming conventions, and levels of data quality. Assigning a semantic tag to individual columns in customer data is an important prerequisite to the Stitch process.

The Semantics link at the top of the Stitch tab opens a dialog box that lists the configured semantics made available to Stitch from domain tables. This list is broken down by domain table, and then by semantic. For each semantic, it lists the semantic, the data type (string, date, integer, and so on), and the name of the field as defined in customer data.

Explore stitched data

The Stitch tab shows the outcome of the Stitch process, including the number of unique Amperity IDs in customer data and a series of charts that highlight the connectivity between data sources.

Cluster graphs

Clustering is the process of deciding which records are included in a customer profile. A matching threshold defines the minimum threshold at which two records can be matched, and then included in a cluster. Lower quality matches may be included, but only as a transitive connection. Distinct customer profiles emerge as a cluster of record pairs.

A cluster graph is one of the outcomes of the Stitch process. It is a visual representation of every pairwise connection in a cluster of records. Each pair can be explored in more detail.

The Cluster Graph tab in the Data Explorer shows a graph with a line relationship between each stitched record, along with a detailed breakdown of PII similarities (and differences) for each pair of stitched records in the cluster graph.

The data explorer, showing the cluster graph.

Deduplication rates

The deduplication rate represents the total number of unique individuals within a customer data set. This rate measures the difference between the total number of original identifiers in customer data and the total number of Amperity IDs that were assigned to unique individuals.

Example

A tenant has three sources of customer records represented by tables 1, 2, and 3. In the Stitch report the:

  • Total number of records is 314.1k

  • Total number of clusters is 212.0k

  • Overall deduplication rate is 32.5%

  • Individual deduplication rates for three customer records are 7.7%, 6.6%, and 0%

How is this possible? Let’s walk through it.

The overall deduplication rate (32.5%) represents the total number of records relative to the number of Amperity IDs. There can be a low deduplication rate on individual tables, but high connectivity between tables.

An UpSet plot chart has a row for each table. In this case, the row for table 1 shows shows 117k source IDs and 108k Amperity IDs. This represents a 7.7% deduplication rate.

Deduplication rates for customer records.

Next compare the overlap between customer records 1 and 3 by hovering over customer record 1. The hover box shows there are more than 69k records shared between tables 1 and 3. This is a significant amount of overlap between two tables and is the primary contributor to the 32.5% overall deduplication rate.

Deduplication rate, explained

The deduplication rate is the reduction that occurs when the total number of Amperity IDs are compared to the original source IDs provided in customer data. For example:

  1. Total records: 314.1k. The sum of all records from all tables.

  2. Total clusters: 212.0k. The sum of all clusters from all tables.

  3. Records in table 1: 117k. The sum of all records in table 1.

  4. Clusters in table 1: 108k. The sum of all clusters in table 1.

The overall deduplication rate is 32.5%:

100 * [(314.1k - 212.9k) / 314.1k] = 32.5%

The deduplication rate for table 1 is 7.7%:

100 * [(117k - 108k) / 117k] = 7.7%

Important

Deduplication rate depends! The previous example shows deduplication rate for a database that does not use customer keys:

(total customer records - Amperity IDs) /  (total customer records)

When a database uses customer keys, the math to determine deduplication rate is the same, but the starting point is the customer keys.

(total customer key records - Amperity IDs) /  (total customer key records)

Pairwise connections

A pairwise connection is a pair of matching records within a block that have an initial score above threshold. Each pairwise connection within a block is scored, after which all pairwise connections that scored above threshold represent a single, unique individual.

The Pairwise Connections tab in the Data Explorer shows a breakdown of stitched record pairs by score.

The data explorer, showing pairwise connections.

A score is assigned to every pairwise connection. The score is measured in two parts, separated by a period.

The first part–the record pair score–correlates to the match category, which is a machine learning classifier that is applied by Amperity to individual record pairs. The record pair score corresponds to the classification: 5 for exact matches, 4 for excellent matches, 3 for high matches, 2 for moderate matches, 1 for weak matches, and 0 for no conflicts.

The second part–the record pair strength–is used by Stitch to help determine the quality of the record pair score. This value appears in the Stitch report as a two decimal number. A record pair strength by itself is not a direct indicator of the quality of a pairwise connection score.

Stitched records

A stitched record is a unique output of the Stitch process that associates the Amperity ID to an individual customer record.

The Stitched Records tab in the Data Explorer shows a table with a row for each of the individual records that share the same Amperity ID.

The data explorer, showing stitched records.

Configure Stitch

Stitch is pre-configured to:

  • Perform ID resolution against customer records that are tagged with profile semantics

  • Apply blocking strategies

  • Apply clustering algorithms

  • Apply common email address patterns

  • Apply thresholds for trivial duplicates

  • Identify supersized clusters

  • Prioritize foreign keys over separation keys

  • Apply matching thresholds

Caution

Stitch configuration does not require modification for most situations. In some cases, after consultation with your Amperity representative and closely investigating the results of Stitch output against your customer data, adjusting Stitch configuration settings may be helpful.

Common scenarios for additional tuning of Stitch outcomes include workflows that:

  • Block certain profile values from Stitch

  • Ensure certain values are included for Stitch, but blocked from the customer 360 database

  • Apply labels to Stitch output that resolve situations where overclustering and/or underclustering of records occurred

  • Include or exclude specific Amperity IDs

Note

Your Amperity administrator will use tools that are part of DataGrid to perform Stitch QA, which is a workflow that verifies the end-to-end quality of Stitch results. Stitch QA is a series of SQL queries that are run against tables in the customer 360 database. Depending on the results of these queries, additional human effort is sometimes necessary to understand how and why Stitch created certain outcomes, after which next steps can be identified.