Quick start: Identity resolution¶
Amperity Customer Data Cloud specializes in using AI to turn raw data into a growing library of robust unified datasets and durable customer profiles that are available to support all of your brand’s use cases.
By the end of this guide you will know how to do the following:
Sync data from your Databricks account to Amperity.
Apply semantic tags to your data sources using AmpAI.
Build an identity graph that links disparate profile records together using a unique and persistent identifier.
Sync unified tables from Amperity to your Databricks account.
Quick Start data model¶
The following diagram shows the data model for the sample data that is part of the Amperity Quick Start. Color coded sections identify which groups of tables are associated with source customer profiles, stitched domain tables, and unified tables.
Note
Click this diagram to open it in your full browser window. Click HERE to open this diagram in a new tab or right-click that link to save a copy to your computer.
Prerequisites¶
To follow-along with this quick start guide you will need:
Access to an Amperity account.
Approximately ~1 hour of time to complete all of the steps within the quick start guide.
Note
More time may be required if you want to sync data from your Databricks account instead of using Amperity sample data.
Source data. Amperity provides a set of sample data that can be used to complete the steps in this guide. You may provide your own data if you have a Databricks account with customer profile data and permissions to configure your Databricks account to accept Delta Sharing with Amperity from the Databricks Unity Catalog.
Note
Amperity sample data contains ~10 million customer records. Additional time may be necessary for loading and processing data if you choose to use your own data instead of Amperity sample data assets, depending on the number of records.
Log in to Amperity¶
To start using the Amperity quick start tenant, do the following:
Log in to Amperity.
Open the Quick Start page. This is located in the left-side menu at the top.
Under Identity resolution click Set up.
This opens the Identity resolution page that will walk you through steps for connecting to Databricks, adding semantic tags to synced tables, running Stitch, and then syncing unified tables back to Databricks.
Connect to Databricks¶
Use Amperity Bridge to connect Databricks to Amperity. This guide uses Amperity sample data assets, but the steps for Databricks are almost the same if you want to use your own data.
You have two options:
Use the provided quick start data assets
Use your own data from your own instance of Databricks
Note
For this option you will need a Databricks account, a configured Unity Catalog, and the ability to set up and manage Delta Sharing. Use these steps to configure your Databricks account to share data with Amperity.
To connect to Amperity sample data
In the Identity resolution quick start, next to Inbound sharing data click Add bridge. This opens the Add bridge dialog box. Choose Sample data. This will open the Select tables dialog box. |
|
Use the Select tables to share dialog box to select the sample data from “amperity-trial/trial-data”. When finished, click Create. This will open the Domain table mapping dialog box. |
|
In the Sample data dialog, review the table names, and then click Save and sync. This will start the sync between Amperity and Databricks. Wait for the sync to finish before continuing to the next step. (Amperity sample data should sync in about 3 minutes.) |
Run Stitch¶
After all of the source tables to which semantic tags should be applied have semantic tags applied you are ready to run Stitch.
To build the identity graph
In the Identity resolution quick start, next to Generate Amperity IDs, click Run Stitch. Wait for the Stitch to finish running before continuing to the next step. This process will 20-30 minutes to complete for Amperity sample data. Note The amount of time it will take to complete against your own data depends on the volume of data that is made available to Stitch, the number of unique data sources with PII, and the complexity of matching individual records across data with unique customer profiles. |
|
The Identity resolution quick start will refresh to show high-level results of identity resolution similar to: Click the box to open the Identity resolution page. This page shows a summary, a collection of benchmarks, along with access to the configuration that was used to get these results. |
|
The Summary tab shows a comparison of domain tables and the record pairs identified both within each data source and across all data sources. This is presented as an UpSet Plot chart with links to the underlying data sources. |
|
The Benchmark tab shows the results of a series of tests that are run by Amperity, grouped by “Optimal” and “Abnormal”. Optimal benchmarks are shown when test results are within the typical range for most brands. Abnormal benchmarks are shown when test results are above or below the typical range for most brands. For each test with abnormal results:
Note Amperity sample data will show mostly abnormal benchmarks. This is because the data is generated and does not represent real customer profile data. If you used your own customer profile data you should expect to see more optimal benchmarks and more actionable abnormal benchmarks. |
Create database¶
A customer 360 database is built using standard core tables that are generated by the Stitch process. These tables provide a unified view of your brand’s customer data, including customer profiles and interaction records, that is organized, merged, and linked together by the Amperity ID.
To create a Customer 360 database
Open the Customer 360 page, select the Databases tab, and then click Create Database. Give the database a name, set the value for Template to “Customer 360”. You can keep the default “Admin” permissions. Click Create. |
|
The Database Editor page opens. The following tables will be in the customer 360 database:
Click Activate. This will return you to the Customer 360 page. |
|
For the database you just created click Run. This will load records to each of the customer 360 database tables. Wait for the data to finish loading before continuing to the next step. This process will 3-5 minutes to complete for Amperity sample data. |
Sync identity data to Databricks¶
Important
The Amperity quick start for identity resolution does not intend for you to sync ~10 million records of fake data to your Databricks Unity Catalog. This section assumes that you are sending real customer profiles from Amperity to Databricks and is a shortened version of the documentation about syncing data from Amperity to Databricks.
Amperity can sync customer profiles to your Databricks account.
Note
Additional configuration in Databricks is often required. Syncing data from Amperity to Databricks can use the same credentials; however, the configuration within Databricks is not the same as syncing data from Databricks to Amperity.
To sync data to Databricks, review the prerequisites, add an outbound bridge, select tables to share with Databricks, download the credentials file, add the provider in Databricks, add catalog from share, and then verify table sharing.
Conclusion and next steps¶
This quick start guide describes how to connect Amperity to Databricks and to configure Amperity to perform identity resolution against data that is synced from Databricks.
Amperity can do a lot more:
Semantic tagging for transactions, loyalty programs, and more
Destinations for paid media, marketing automation, offline events
Queries and orchestrations