Graph database

A graph database stores relationships between data items. A graph database that contains Amperity data stores relationships between customer profiles, transactions, and the Amperity ID graph.

A graph database is a collection of vertices and edges:

  • A vertex is a data item in a graph database. A vertex is sometimes referred to as a node.

  • An edge is the relationship between two vertices in a graph database. Each edge has a type and must start with one vertex and end with another.

Important

This topic describes a graph database that uses the Gremlin data format, which is a supported data format for Amazon Neptune.

The graph database queries included with Amperity may be customized to support any graph database format and destination.

Graph database workflow

A graph database workflow has two main steps:

  1. Add query templates, and then run queries in Amperity to return data in a format that can be loaded by a graph database.

  2. Send the data to Amazon S3, Azure Blob Storage, Google Cloud Storage, or SFTP, and then load that data to a graph database.

  3. Enabling downstream use cases.

Add query templates

The query templates for the graph database are added to, and then accessed from the Queries page.

To add Graph Database queries

  1. From the Queries page click Create, and then select Add Folder. This opens the Create Folder dialog box.

  2. Name the folder “Graph Database”.

  3. From the Template drop-down, select Graph Database.

  4. Click Create. This will create a folder, into which a collection of draft Graph Database queries are added.

  5. Take a few minutes to review the queries that are added by the “Graph Database” query template.

Send to a graph database

Send graph database query results to one of the following destinations:

and then load that data to your preferred query database.

The following graph databases support using Gremlin and importing CSV files directly:

Other graph databases, such as JanusGraph require using APIs to convert, and then load the graph data.

Use cases

A graph database enables downstream use cases, such as:

  1. Supporting advanced data science teams who need to use internal tools and models to interact with Amperity data.

  2. Establishing links between profiles and preferences, and then using those results to augment data that is provided to a recommendation engine.

  3. Using graphs to support CCPA and GDPR compliance, such as by identifying one piece of data, and then traversing the identity graph.

About Gremlin

Gremlin is a path-oriented query language that is used for complex traversals of graph databases. Gremlin is a component of Apache Tinkerpop and is supported by most graph databases.

Amperity graph database templates provide a template that outputs to a CSV file that is easily integrated into a graph database that supports the Gremin graph traversal language.

Vertices

A vertex is a data item in a graph database. A vertex is sometimes referred to as a node.

Field

Required?

Description

~id

The unique identifier for the vertice.

name:type

The properties of the vertice. There is no limit to the number of properties that may be specified.

The format for this field is is the name of the field in the Amperity data table and its data type, separated by a colon. For example: given_name:String or birthdate:Date.

~label

The type of vertice that matches the object type. For Amperity, this is a customer profile, a Customer 360 profile, or a transaction.

Edges

An edge is the relationship between two vertices in a graph database. Each edge has a type and must start with one vertex and end with another.

Field

Required?

Description

~id

The unique identifier for the edge.

~from

The primary key for the starting vertice. This must correspond to a vertice with a matching ~id.

~to

The primary key for the ending vertice. This must correspond to a vertice with a matching ~id.

~label

The type of vertice that matches the object type. For Amperity, this is a customer profile, a Customer 360 profile, or a transaction.

name:type

The properties of the vertice. There is no limit to the number of properties that may be specified.

The format for this field is is the name of the field in the Amperity data table and its data type, separated by a colon. For example: given_name:String or birthdate:Date.

Amperity query templates

Use the default query templates to return vertices and edges for customer profiles, transactions, and the Amperity ID graph. Extend the query database templates to match your tenant.

The following default query templates are available:

Profile vertices

Profile vertices are derived from the Unified Coalesced table.

SELECT
  CONCAT(uc.pk,uc.datasource) AS "~id"
  ,uc.amperity_id AS "amperity_id:String"
  ,uc.datasource AS "datasource:String"
  ,uc.given_name AS "givenname:String"
  ,uc.surname AS "surname:String"
  ,uc.email AS "email:String"
  ,uc.phone AS "phone:String"
  ,uc.gender AS "gender:String"
  ,uc.birthdate AS "birthdate:Date"
  ,uc.address AS "address:String"
  ,uc.address2 AS "address2:String"
  ,uc.city AS "city:String"
  ,uc.state AS "state:String"
  ,uc.postal AS "postal:String"
  ,'Original_Profile' AS "~label"
FROM Unified_Coalesced AS uc

Profile edges

Profile edges link scores in the Unified Scores table to raw profiles

SELECT
  CONCAT(us.amperity_id, us.pk1, us.pk2) AS "~id"
  ,CONCAT(us.pk1,source1) AS "~from"
  ,CONCAT(us.pk2,source2) AS "~to"
  ,'Amperity ID Link' AS "~label"
  ,us.score AS "score:Double"
FROM Unified_Scores AS us

Customer 360 vertices

Customer 360 vertices are derived from the Customer 360 table. Vertices can be focused on the customer profile:

SELECT
  cus.amperity_id AS "~id"
  ,cus.given_name AS "givenname:String"
  ,cus.surname AS "surname:String"
  ,cus.email AS "email:String"
  ,cus.phone AS "phone:String"
  ,cus.gender AS "gender:String"
  ,cus.birthdate AS "birthdate:Date"
  ,cus.address AS "address:String"
  ,cus.address2 AS "address2:String"
  ,cus.city AS "city:String"
  ,cus.state AS "state:String"
  ,cus.postal AS "postal:String"
  ,'Amperity Profile' AS "~label"
FROM Customer_360 AS cus

Vertices can be focused on customer profile and transactions:

SELECT
  cus.amperity_id AS "~id"
  ,cus.given_name AS "givenname:String"
  ,cus.surname AS "surname:String"
  ,cus.email AS "email:String"
  ,cus.phone AS "phone:String"
  ,cus.gender AS "gender:String"
  ,cus.birthdate AS "birthdate:Date"
  ,cus.address AS "address:String"
  ,cus.address2 AS "address2:String"
  ,cus.city AS "city:String"
  ,cus.state AS "state:String"
  ,cus.postal AS "postal:String"
  ,cus.multi_purchase_brand AS "multi_purchase_brand:Bool"
  ,cus.multi_purchase_channel AS "multi_purchase_channel:Bool"
  ,cus.one_and_done AS "one_and_done:Bool"
  ,cus.early_repeat_purchaser AS "early_repeat_purchaser:Bool"
  ,cus.lifetime_preferred_purchase_brand AS "lifetime_preferred_purchase_brand:String"
  ,cus.lifetime_preferred_purchase_channel AS "lifetime_preferred_purchase_channel:String"
  ,cus.lifetime_order_revenue AS "lifetime_order_revenue:Double"
  ,cus.lifetime_order_frequency AS "lifetime_order_frequency:Int"
  ,cus.lifetime_total_items AS "lifetime_total_items:Int"
  ,cus.lifetime_largest_order_value AS "lifetime_largest_order_value:Double"
  ,cus.lifetime_average_order_value AS "lifelifetime_average_order_value:Double"
  ,cus.lifetime_average_item_price AS "lifetime_average_item_price:Double"
  ,cus.lifetime_average_num_items AS "lifetime_average_num_items:Double"
  ,'Amperity Profile' AS "~label"
FROM Customer_360 AS cus

Customer 360 edges

Customer 360 edges link raw profiles in the Unified Coalesced table to profiles in the Customer 360 table.

SELECT DISTINCT
  CONCAT(uc.pk, uc.amperity_id) AS "~id"
  ,CONCAT(uc.pk,uc.datasource) AS "~from"
  ,uc.amperity_id AS "~to"
  ,'Customer360' AS "~label"
FROM Unified_Coalesced AS uc

Transaction vertices

Transaction vertices are derived from the Unified Transactions table. Vertices should have a consistent primary key and should link to vertices derived from the Customer 360 table using the Amperity ID.

SELECT
  CONCAT(ut.pk, ut.amperity_id) AS "~id"
  ,ut.amperity_id AS "amperity_id:String"
  ,ut.datasource AS "datasource:String"
  ,ut.order_id AS "order_id:String"
  ,ut.store_id AS "store_id:String"
  ,ut.digital_channel AS "digital_channel:String"
  ,ut.purchase_channel AS "purchase_channel:String"
  ,ut.purchase_brand AS "purchase_brand:String"
  ,ut.order_datetime AS "order_datetime:Date"
  ,ut.order_revenue AS "order_revenue:Double"
  ,ut.order_quantity AS "order_quantity:Int"
  ,ut.order_returned_quantity AS "order_returned_quantity:Int"
  ,ut.order_canceled_quantity AS "order_canceled_quantity:Int"
  ,ut.order_returned_revenue AS "order_returned_revenue:Double"
  ,ut.order_canceled_revenue AS "order_canceled_revenue:Double"
  ,'Transaction' AS "~label"
FROM Unified_Transactions AS ut
WHERE ut.amperity_id IS NOT NULL AND ut.order_id IS NOT NULL

Transaction edges

Transaction edges link vertices derived from the Unified Transactions table to vertices derived from the Customer 360 table. Each link should have a unique primary key and the ~from value should be a unique ~id value from transactions data.

SELECT DISTINCT
  CONCAT(ut.pk, ut.amperity_id) AS "~id"
  ,CONCAT(ut.pk, ut.amperity_id) AS "~from"
  ,ut.amperity_id AS "~to"
  ,'Transaction Link' AS "~label"
FROM Unified_Transactions AS ut
WHERE ut.amperity_id IS NOT NULL AND ut.order_id IS NOT NULL