Quick start: Identity resolution¶
Amperity Customer Data Cloud specializes in using AI to turn raw data into a growing library of robust unified datasets and durable customer profiles that are available to support all of your brand’s use cases.
By the end of this guide you will know how to do the following:
Sync data from your Databricks account to Amperity.
Apply semantic tags to your data sources using AmpAI.
Build an identity graph that links disparate profile records together using a unique and persistent identifier.
Sync unified tables from Amperity to your Databricks account.
Sign up for a free trial of Amperity!
Sign up for a free trial to see how your brand can use Amperity to build robust unified datasets and durable customer profiles that support all of your brand’s use cases. You may use trial data provided by Amperity or you may upload samples of your own customer data.
Quick Start data model¶
The following diagram shows the data model for the sample data that is part of the Amperity Quick Start. Color coded sections identify which groups of tables are associated with source customer profiles, stitched domain tables, and unified tables.

Note
Click this diagram to open it in your full browser window. Click HERE to open this diagram in a new tab or right-click that link to save a copy to your computer.
Prerequisites¶
To follow-along with this quick start guide you will need:
Access to an Amperity account.
Approximately ~1 hour of time to complete all of the steps within the quick start guide.
Note
More time may be required if you want to sync data from your Databricks account instead of using Amperity sample data.
Source data. Amperity provides a set of sample data that can be used to complete the steps in this guide. You may provide your own data if you have a Databricks account with customer profile data and permissions to configure your Databricks account to accept Delta Sharing with Amperity from the Databricks Unity Catalog.
Note
Amperity sample data contains ~10 million customer records. Additional time may be necessary for loading and processing data if you choose to use your own data instead of Amperity sample data assets, depending on the number of records.
Log in to Amperity¶
To start using the Amperity quick start tenant, do the following:
Log in to Amperity.
Open the Quick Start page. This is located in the left-side menu at the top.
Under Identity resolution click Set up.
This opens the Identity resolution page that will walk you through steps for connecting to Databricks, adding semantic tags to synced tables, running Stitch, and then syncing unified tables back to Databricks.
Connect to Databricks¶
Use Amperity Bridge to connect Databricks to Amperity. This guide uses Amperity sample data assets, but the steps for Databricks are almost the same if you want to use your own data.
You have two options:
Use the provided quick start data assets
Use your own data from your own instance of Databricks
Note
For this option you will need a Databricks account, a configured Unity Catalog, and the ability to set up and manage Delta Sharing. Use these steps to configure your Databricks account to share data with Amperity.
To connect to Amperity sample data
![]() |
In the Identity resolution quick start, next to Inbound sharing data click Add bridge. ![]() This opens the Add bridge dialog box. Choose Sample data. This will open the Select tables dialog box. |
![]() |
Use the Select tables to share dialog box to select the sample data from “amperity-trial/trial-data”. ![]() When finished, click Create. This will open the Domain table mapping dialog box. |
![]() |
In the Sample data dialog, review the table names, and then click Save and sync. ![]() This will start the sync between Amperity and Databricks. Wait for the sync to finish before continuing to the next step. (Amperity sample data should sync in about 3 minutes.) |
Run Stitch¶
After all of the source tables to which semantic tags should be applied have semantic tags applied you are ready to run Stitch.
To build the identity graph
![]() |
In the Identity resolution quick start, next to Generate Amperity IDs, click Run Stitch. Wait for the Stitch to finish running before continuing to the next step. This process will 20-30 minutes to complete for Amperity sample data. Note The amount of time it will take to complete against your own data depends on the volume of data that is made available to Stitch, the number of unique data sources with PII, and the complexity of matching individual records across data with unique customer profiles. |
![]() |
The Identity resolution quick start will refresh to show high-level results of identity resolution similar to: ![]() Click the box to open the Identity resolution page. This page shows a summary, a collection of benchmarks, along with access to the configuration that was used to get these results. |
![]() |
The Summary tab shows a comparison of domain tables and the record pairs identified both within each data source and across all data sources. This is presented as an UpSet Plot chart with links to the underlying data sources. |
![]() |
The Benchmark tab shows the results of a series of tests that are run by Amperity, grouped by “Optimal” and “Abnormal”. Optimal benchmarks are shown when test results are within the typical range for most brands. ![]() Abnormal benchmarks are shown when test results are above or below the typical range for most brands. ![]() For each test with abnormal results:
Note Amperity sample data will show mostly abnormal benchmarks. This is because the data is generated and does not represent real customer profile data. If you used your own customer profile data you should expect to see more optimal benchmarks and more actionable abnormal benchmarks. |
Create database¶
A customer 360 database is built using standard core tables that are generated by the Stitch process. These tables provide a unified view of your brand’s customer data, including customer profiles and interaction records, that is organized, merged, and linked together by the Amperity ID.
To create a Customer 360 database
![]() |
Open the Customer 360 page, select the Databases tab, and then click Create Database. Give the database a name, set the value for Template to “Customer 360”. You can keep the default “Admin” permissions. Click Create. |
![]() |
The Database Editor page opens. The following tables will be in the customer 360 database:
Click Activate. This will return you to the Customer 360 page. |
![]() |
For the database you just created click Run. This will load records to each of the customer 360 database tables. Wait for the data to finish loading before continuing to the next step. This process will 3-5 minutes to complete for Amperity sample data. |
Sync identity data to Databricks¶
Important
The Amperity quick start for identity resolution does not intend for you to sync ~10 million records of fake data to your Databricks Unity Catalog. This section assumes that you are sending real customer profiles from Amperity to Databricks and is a shortened version of the documentation about syncing data from Amperity to Databricks.
Amperity can sync customer profiles to your Databricks account.
Note
Additional configuration in Databricks is often required. Syncing data from Amperity to Databricks can use the same credentials; however, the configuration within Databricks is not the same as syncing data from Databricks to Amperity.
To sync data to Databricks, review the prerequisites, add an outbound bridge, select tables to share with Databricks, download the credentials file, add the provider in Databricks, add catalog from share, and then verify table sharing.
Merge policy¶
Merge policy defines how the Merged Customers table will be maintained by Amperity. The Merged Customers table collects PII data from all source datasets, and then collapses the best data into single row that is unique by Amperity ID. Each row in the Merged Customers table represents a single customer’s best set of profile data.
Use merge policy to tell Amperity which tables are the most reliable sources of customer profile data.
To define merge policy
![]() |
Source priority can be defined for names, physical addresses, email addresses, phone numbers, birthdates, and gender. To configure source priority for profile attributes open the Profile Builder. For each profile attribute, use the icon to move the list of tables into the desired order, and then click Save. How source priority works Tables A, B, and C all contain a field with email addresses to which the email semantic tag is applied. They are ranked 1) table A, 2) table B, and 3) table C. If the value in table A is “justin@email.com” then the priority for email address is table A and the value “justin@email.com”. If the value in table A is NULL and the value in table B is “justinc@email.com” then the priority for email address is table B and the value “justinc@email.com”. If the values in tables A and B are NULL and the value in table C is “justin.c@email.com” then the priority for email address is table C and the value “justin.c@email.com”. |
![]() |
Source table precedence can also be defined for data sources that contain semantic tags that are not grouped by profile attribute. Precedence determines which tables are more likely to contain high quality customer profile data, as determined by your brand. The list of domain tables under Source_Priority must contain at least one domain table that has been made available to Stitch and contains fields to which profile semantic tags have been applied. To configure source table precedence, open the Profile Builder. Under Source table precedence, use the icon to move the list of tables into the desired order, and then click Save. |
Conclusion and next steps¶
This quick start guide describes how to connect Amperity to Databricks and to configure Amperity to perform identity resolution against data that is synced from Databricks.
Amperity can do a lot more:
Semantic tagging for transactions, loyalty programs, and more
Destinations for paid media, marketing automation, offline events
Queries and orchestrations