Quick start: Identity resolution

Quick start: Identity resolution

Dec 20, 2024

10 min read; ~1 hour to complete

Amperity Customer Data Cloud specializes in using AI to turn raw data into a growing library of robust unified datasets and durable customer profiles that are available to support all of your brand’s use cases.

By the end of this guide you will know how to do the following:

  1. Sync data from your Databricks account to Amperity.

  2. Apply semantic tags to your data sources using AmpAI.

  3. Build an identity graph that links disparate profile records together using a unique and persistent identifier.

  4. Sync unified tables from Amperity to your Databricks account.

Quick Start data model

The following diagram shows the data model for the sample data that is part of the Amperity Quick Start. Color coded sections identify which groups of tables are associated with source customer profiles, stitched domain tables, and unified tables.

The sample data model for Amperity Quick Start.

Note

Click this diagram to open it in your full browser window. Click HERE to open this diagram in a new tab or right-click that link to save a copy to your computer.

Prerequisites

To follow-along with this quick start guide you will need:

  1. Access to an Amperity account.

  2. Approximately ~1 hour of time to complete all of the steps within the quick start guide.

    Note

    More time may be required if you want to sync data from your Databricks account instead of using Amperity sample data.

  3. Source data. Amperity provides a set of sample data that can be used to complete the steps in this guide. You may provide your own data if you have a Databricks account with customer profile data and permissions to configure your Databricks account to accept Delta Sharing with Amperity from the Databricks Unity Catalog.

    Note

    Amperity sample data contains ~10 million customer records. Additional time may be necessary for loading and processing data if you choose to use your own data instead of Amperity sample data assets, depending on the number of records.

Log in to Amperity

To start using the Amperity quick start tenant, do the following:

  1. Log in to Amperity.

  2. Open the Quick Start page. This is located in the left-side menu at the top.

  3. Under Identity resolution click Set up.

    This opens the Identity resolution page that will walk you through steps for connecting to Databricks, adding semantic tags to synced tables, running Stitch, and then syncing unified tables back to Databricks.

Connect to Databricks

Use Amperity Bridge to connect Databricks to Amperity. This guide uses Amperity sample data assets, but the steps for Databricks are almost the same if you want to use your own data.

You have two options:

  1. Use the provided quick start data assets

  2. Use your own data from your own instance of Databricks

    Note

    For this option you will need a Databricks account, a configured Unity Catalog, and the ability to set up and manage Delta Sharing. Use these steps to configure your Databricks account to share data with Amperity.

To connect to Amperity sample data

Step 1.

In the Identity resolution quick start, next to Inbound sharing data click Add bridge.

Add a bridge for a sync.

This opens the Add bridge dialog box.

Choose Sample data. This will open the Select tables dialog box.

Step 3.

Use the Select tables to share dialog box to select the sample data from “amperity-trial/trial-data”.

Select schemas and tables to be shared.

When finished, click Create. This will open the Domain table mapping dialog box.

Step 4.

In the Sample data dialog, review the table names, and then click Save and sync.

Map inbound synced tables to domain tables.

This will start the sync between Amperity and Databricks. Wait for the sync to finish before continuing to the next step. (Amperity sample data should sync in about 3 minutes.)

Add semantic tags

Semantic tags are applied to fields in incoming data sources to indicate the type of data that is contained within those fields.

The semantic tag tells Amperity how to treat the data, regardless of how the data is formatted, named, or originally stored.

For example, a field named evar_15 contains email addresses. This field should have the email semantic tag applied to it. This tag tells Amperity that the contents of the evar_15 field are

  • Email addresses

  • Personally identifiable information (PII)

Email addresses are an important part of the identity resolution process. Using a semantic tag to tell Amperity which fields in your data sources contain email addresses (and PII!) saves you a lot of time because you don’t have to do any data processing, ETLs, or data modeling before making that data source available to Amperity.

Connect the data, apply the semantic tag, and build customer profiles.

You have two options:

  1. Let AmpAI apply semantic tags (this section)

  2. Manually apply semantic tags

To let AmpAI apply semantic tags

Step 1.

In the Identity resolution quick start, if you are using Amperity sample data, next to Identity tables, click AmpAI select.

AmpAI will analyze the sample data and identify which tables contain PII, and then idenfity which semantic tags should be applied. You may change the tags AmpAI assigns to fields.

Click Continue. Wait for the AmpAI to finish applying semantic tags before continuing to the next step. This process will take up to 5 minutes to complete.

Step 2.

When AmpAI is finished applying semantic tags, next to Identify your fields, click Edit.

This opens the Semantic tag editor. For each table that AmpAI applied semantic tags a list of fields, field types, and semantic tags are shown.

AmpAI will correctly assign semantic tags to all of the sample data tables, so you can click the Save button in the top right.

Important

If you are using your own data review the fields carefully. AmpAI will apply semantic tags for PII correctly most of the time, but it’s good to double-check and be sure. If you think they are wrong, just remove the tag AmpAI applied and find the correct semantic tag.

Run Stitch

After all of the source tables to which semantic tags should be applied have semantic tags applied you are ready to run Stitch.

To build the identity graph

Step 1.

In the Identity resolution quick start, next to Generate Amperity IDs, click Run Stitch.

Wait for the Stitch to finish running before continuing to the next step. This process will 20-30 minutes to complete for Amperity sample data.

Note

The amount of time it will take to complete against your own data depends on the volume of data that is made available to Stitch, the number of unique data sources with PII, and the complexity of matching individual records across data with unique customer profiles.

Step 2.

The Identity resolution quick start will refresh to show high-level results of identity resolution similar to:

Results of identity resolution in Amperity quick start.

Click the box to open the Identity resolution page. This page shows a summary, a collection of benchmarks, along with access to the configuration that was used to get these results.

Step 3.

The Summary tab shows a comparison of domain tables and the record pairs identified both within each data source and across all data sources. This is presented as an UpSet Plot chart with links to the underlying data sources.

Step 4.

The Benchmark tab shows the results of a series of tests that are run by Amperity, grouped by “Optimal” and “Abnormal”.

Optimal benchmarks are shown when test results are within the typical range for most brands.

Optimal benchmark results.

Abnormal benchmarks are shown when test results are above or below the typical range for most brands.

Optimal benchmark results.

For each test with abnormal results:

  1. Step through and grade the result as a “Good example” or “Poor example”. When benchmark grading is finished click Next.,

  2. Review the list of steps you can take to improve customer profile quality.

Note

Amperity sample data will show mostly abnormal benchmarks. This is because the data is generated and does not represent real customer profile data. If you used your own customer profile data you should expect to see more optimal benchmarks and more actionable abnormal benchmarks.

Create database

A customer 360 database is built using standard core tables that are generated by the Stitch process. These tables provide a unified view of your brand’s customer data, including customer profiles and interaction records, that is organized, merged, and linked together by the Amperity ID.

To create a Customer 360 database

Step 1.

Open the Customer 360 page, select the Databases tab, and then click Create Database.

Give the database a name, set the value for Template to “Customer 360”. You can keep the default “Admin” permissions. Click Create.

Step 2.

The Database Editor page opens.

The following tables will be in the customer 360 database:

  • Customer_360. A standardized table with the most complete set of customer profile data that is built from merge rules with a single row for each unique Amperity ID.

  • Merged_Customers. A standardized table that contains configurable merge rules.

  • Unified_Coalesced. A standardized table that contains all original data used to build the identity graph.

  • Unified_Scores. A standardized table that contains the edges of the identity graph with confidence scores for each linked record.

Click Activate. This will return you to the Customer 360 page.

Step 2.

For the database you just created click Run. This will load records to each of the customer 360 database tables.

Wait for the data to finish loading before continuing to the next step. This process will 3-5 minutes to complete for Amperity sample data.

Sync identity data to Databricks

Important

The Amperity quick start for identity resolution does not intend for you to sync ~10 million records of fake data to your Databricks Unity Catalog. This section assumes that you are sending real customer profiles from Amperity to Databricks and is a shortened version of the documentation about syncing data from Amperity to Databricks.

Amperity can sync customer profiles to your Databricks account.

Note

Additional configuration in Databricks is often required. Syncing data from Amperity to Databricks can use the same credentials; however, the configuration within Databricks is not the same as syncing data from Databricks to Amperity.

To sync data to Databricks, review the prerequisites, add an outbound bridge, select tables to share with Databricks, download the credentials file, add the provider in Databricks, add catalog from share, and then verify table sharing.

Conclusion and next steps

This quick start guide describes how to connect Amperity to Databricks and to configure Amperity to perform identity resolution against data that is synced from Databricks.

Amperity can do a lot more: