Amperity Bridge¶
Amperity Bridge allows users to share data between Amperity and a data lakehouse using industry-standard data formats. Each bridge can be quickly configured to enable inbound and/or outbound connections that give your brand access to shared tables without replicating data.
Advantages of Amperity Bridge include:
Fast setup Connect Amperity to a lakehouse in minutes using sharing keys instead of integrations.
Zero copy Control access to shared tables without replicating data across platforms. Build pipelines faster and consolidate your brand’s storage costs into a single location.
Scalable processing Enrich massive volumes of data quickly. Data is not moved or transformed from where it resides. Model customer data directly in the lakehouse or model it in Amperity.
Live data View customer data at rest in a lakehouse or in Amperity through a shared catalog. Explore and query data without waiting for refreshes or updates.
Amperity Learning Lab
Amperity Bridge enables data sharing between Amperity and data lakehouses. Each bridge can be quickly configured for inbound and outbound shares to give your brand access to shared tables without replication. Start with an overview of data warehouses, compare Databricks and Snowflake, and then learn how Amperity Bridge shares data between Amperity and Databricks. Open Learning Lab to learn more about how Amperity Bridge works. Registration is required. |
Inbound shares¶
Delta Sharing is an open protocol for simple and secure sharing of live data between organizations without copying data to another system and regardless of which computing platforms are used.
A bridge represents a connection between Amperity and an external lakehouse. Each bridge may be configured for one inbound and one outbound connection.
An inbound share represents the configuration for how a shared dataset is made available from a lakehouse to Amperity, such as using the Delta Sharing open protocol to share data with Databricks.
An inbound share is configured in a series of steps across Databricks and Amperity.
Inbound prerequisites¶
Before you can create inbound sharing between Databricks and Amperity a recipient and share must be created in Databricks, after which tables are added to the share and access to the share is granted to the recipient.
The user who performs these actions may use the Databricks CLI or the Databricks Catalog Explorer and must be assigned the CREATE RECIPIENT, CREATE SHARE, USE CATALOG, USE SCHEMA, and SELECT permissions, along with the ability to grant the recipient access to the share.
The user who will create a recipient for sharing data from Databricks to Amperity must have CREATE CATALOG permissions in Databricks. Note If a Databricks notebook is used to create the recipient the cluster must use Databricks Runtime 11.3 LTS (or higher) and must be running in shared mode or single-cluster access mode. |
|
The user who will create a share in the Unity Catalog metastore must have CREATE SHARE permissions in Databricks. |
|
The user who will add tables to a share must:
|
|
The user who grants the recipient access to the metastore must be one of the following:
|
|
The IP address for Amperity may need to be added to an allowlist. Most connections are made directly to your Amperity tenant. Use one of the following Amperity IP addresses for an allowlist that is required by an upstream system. The specific IP address to use depends on the location in which your tenant is hosted:
|
|
For bridges that connect to Databricks environments running in Microsoft Azure and are using storage account firewalls, the outbound subnet IDs for Amperity Bridge must be configured in Microsoft Azure using the Azure CLI. |
Configure Databricks¶
To configure Databricks to share data with Amperity you will need to CREATE SHARE and add tables to that share, CREATE RECIPIENT , grant the recipient access to the share , and then get an activation link . The activation link allows a user to download a credential file that is required to configure inbound sharing in Amperity.
Note
The following section briefly describes using the Databricks Catalog Explorer to configure Databricks to be ready to share data with Amperity, along with links to Databricks documentation for each step. You may use the Databricks CLI if you prefer. Instructions for using the Databricks CLI are available from the linked pages.
To configure Databricks for inbound sharing to Amperity
A share is a securable object in Unity Catalog that can be configured to share tables with Amperity. Open the Databricks Catalog Explorer. Under Delta Sharing, choose Shared by me, then select Share data, and then create a share . After you have created the share you may add tables to the share . Click Add assets, and then select the tables to share. |
|
A recipient in Databricks represents the entity that will consume shared data: Amperity. Configure the recipient for open sharing and to use token-based authentication. Open the Databricks Catalog Explorer. Under Delta Sharing, choose Shared by me, and then click New recipient to create a recipient . After the recipient is created, grant the recipient access to the share . |
|
Open sharing uses token-based authentication. The credentials file that contains the token is available from an activation link . Use a secure channel to share the activation link with the user who will download the credentials file, and then configure Amperity for inbound sharing. Important You can download the credential file only once. Recipients should treat the downloaded credential as a secret and must not share it outside of their organization. If you have concerns that a credential may have been handled insecurely, you can rotate credentials at any time. |
Configure subnet IDs¶
For bridges that connect to Databricks environments running in Microsoft Azure and are using storage account firewalls, the outbound subnet IDs for Amperity Bridge must be configured in Microsoft Azure using the Azure CLI. This step is only required for Microsoft Azure storage accounts running in any of the following regions: az-prod East US 2, az-prod East US, or az-prod-en1 North Europe.
Important
The following command line examples use placeholders. Replace “myresourcegroup” and “mystorageaccount” to the names of the resource group and storage account that exists within your Microsoft Azure environment.
az-prod East US 2
az storage account network-rule add --subnet \
/subscriptions/e733fc0a-b51a-4e9d-b6bb-fffc216f4d87/ \
resourceGroups/prod/providers/Microsoft.Network/ \
virtualNetworks/prod/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"
az-prod East US
az storage account network-rule add --subnet \
/subscriptions/e733fc0a-b51a-4e9d-b6bb-fffc216f4d87/ \
resourceGroups/prod-compute-failover/providers/Microsoft.Network/ \
virtualNetworks/prod-compute-failover/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"
az-prod-en1 North Europe
az storage account network-rule add --subnet \
/subscriptions/0e2b72b5-de51-4c28-8ba3-355fc7db10b7/ \
resourceGroups/prod-en1/providers/Microsoft.Network/ \
virtualNetworks/vnet/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"
Add inbound bridge¶
A bridge represents a connection between Amperity and an external lakehouse. Each bridge may be configured for one inbound and one outbound connection.
To add an inbound bridge
Open the Sources page. Under Inbound shares click Add bridge. This opens the Create bridge dialog box. Add the name for the bridge and a description or select an existing bridge, and then click Confirm. |
|
Connect the bridge to Databricks by uploading the credential file that was downloaded from the activation link . There are two ways to upload the credential file:
After the credential file is uploaded, click Continue. Important You can download the credential file only once. Recipients should treat the downloaded credential as a secret and must not share it outside of their organization. If you have concerns that a credential may have been handled insecurely, you can rotate credentials at any time. When finished, click Continue. This will open the Select tables to share dialog box. |
|
Use the Select tables to share dialog box to select any combination of schemas and tables to be synced to Amperity. If you select a schema, all tables in that schema will be synced. Any new tables added later will need to be manually added to the sync. When finished, click Next. This will open the Domain table mapping dialog box. |
|
Map the tables that are shared from Databricks to domain tables in Amperity. Tables that are shared with Amperity are added as domain tables.
Tip Use a custom domain table to assign primary keys, apply semantic tags, and shape data within shared tables to support any of your Amperity workflows. When finished, click Save and sync. This will start a workflow that synchronizes data from Databricks to Amperity and will create the mapped domain table names. You can manually sync tables that are shared with Amperity using the Sync option from the Actions menu for the inbound bridge. |
Outbound shares¶
Delta Sharing is an open protocol for simple and secure sharing of live data between organizations without copying data to another system and regardless of which computing platforms are used.
A bridge represents a connection between Amperity and an external lakehouse. Each bridge may be configured for one inbound and one outbound connection.
An outbound share represents the configuration for how a shared dataset is made available to a lakehouse from Amperity, such as using the Delta Sharing open protocol to share data with Databricks.
An outbound share is configured in a series of steps across Databricks and Amperity.
Tip
If you have already installed and configured the Databricks CLI and have permission to configure catalogs and providers in Databricks, the configuration process for outbound shares takes about 5 minutes.
Outbound prerequisites¶
Before you can create outbound sharing between Amperity and Databricks you must have permission to create providers and catalogs in Databricks. You may create the provider from the Databricks user interface, using the Databricks CLI, or by using Python.
The user who will add the schema to a catalog in Databricks must have CREATE CATALOG permissions in Databricks. |
|
A user who will run queries against tables in a schema must have SELECT permissions in Databricks. SELECT permissions may be granted on a specific table, on a schema, or on a catalog. |
|
To use the Databricks CLI, it must be installed and configured on your workstation. For new users … If you have not already set up and configured the Databricks CLI you will need to do the following:
The user who will run the Databricks CLI and add a schema to Databricks for outbound sharing from Amperity must have CREATE PROVIDER permissions in Databricks. |
Add outbound bridge¶
A bridge represents a connection between Amperity and an external lakehouse. Each bridge may be configured for one inbound and one outbound connection.
To add an outbound bridge
Open the Destinations page. Under Outbound shares click Add bridge. This opens the Create bridge dialog box. |
|
Add the name for the bridge and a description, and then set the duration for which the token will remain active. Optional. You may restrict access to specific IPs or to a valid CIDR (for a range of IPs). Place separate entries on a new line. Expand Advanced Settings to restrict access. When finished, click Create. This will open the Select tables to share dialog box, in which you will configure any combination of schemas and tables to share with Databricks. |
Select tables to share¶
A shared dataset represents all databases and/or database tables that are configured for outbound sharing with another organization.
You can configure Amperity to share any combination of schemas and tables that are available from the Customer 360 page.
To select schemas and tables to share
After you have configured the settings for the bridge, click Next to open the Select tables to share dialog box. You may select any combination of schemas and tables. If you select a schema, all tables in that schema will be shared, including all changes made to all tables in that schema. When finished, click Save. This will open the Download credential dialog box, from which you will download the credentials.share file that is required by the Databricks CLI when creating a catalog in Databricks. |
|
When a bridge is already configured, you may edit the list of schemas and tables that are shared. From the Destinations page, under Outbound shares, open the Actions for a bridge, and then click Edit. This will open the Select tables to share dialog box. |
Download credential file¶
There are two ways to download the credential file:
Click the Download credential button as part of the steps shown when you configure a bridge by clicking the Add bridge button located under Outbound shares on the Destinations page. |
|
Choosing the Download credential option from the Actions menu for an outbound share. |
Add provider¶
Databricks supports a variety of methods for adding a provider to a catalog. Use the method that works best for your organization:
Databricks UI¶
You can create a provider directly from the Databricks user interface. Upload the Amperity share credentials directly as part of this process.
Open the Databricks user interface. Open Catalog Explorer, then Delta Sharing, and then Shared with me. |
|
At the bottom of the Shared with me page, click the Import provider directly button. This opens the Import Provider dialog. Give the provider a name, and then upload the credential for the Amperity share. Click Import. This opens the providers page. |
|
On the providers page, click Create catalog to add a catalog for the data that is shared from Amperity. |
Databricks CLI¶
You can use the Databricks CLI to create a provider in Databricks. Attach the credentials that were downloaded from Amperity to the schema as part of the command that creates the bridge between Amperity and the provider in Databricks.
Open the Databricks CLI in a command window. |
|
Run the databricks providers create command: $ databricks providers create socktown \
TOKEN \
-recipient-profile-str "$(< path/to/config.share)"
where TOKEN is your Databricks personal access token, socktown is the name of the provider, and “path/to/config.share” represents the path to the location into which the Amperity credentials file was downloaded. Databricks CLI and Windows environments If you are running the Databricks CLI using Powershell, the command is similar to: $ databricks providers create socktown \
TOKEN \
--recipient-profile-str \
(Get-Content -Raw path\to\config.share)
If you are running the Databricks CLI using CMD, the command is similar to: setlocal enabledelayedexpansion ^
set "str=" ^
for /f "delims=" %a in (path\to\config.share) ^
do set "str=!str!%a" ^
databricks providers create socktown TOKEN ^
--recipient-profile-str "!str!" ^
endlocal
|
|
A successful response from Databricks is similar to: {
"authentication_type":"TOKEN",
"created_at":1714696789105,
"created_by":"user@socktown.com",
"name":"socktown",
"owner":"user@socktown.com",
"recipient_profile": {
"endpoint":"URL for Amperity bridge endpoint",
"share_credentials_version":1
},
"updated_at":1714696789105,
"updated_by":"user@socktown.com"
}
You must have CREATE PROVIDER permissions An error message is returned when a user who runs the databricks providers create command does not have CREATE PROVIDER permissions to the Databricks metastore. This error is similar to: Error: User does not have CREATE PROVIDER \
on Metastore '<metastore>'.
If you receive this error message:
|
Python¶
You can use Python to create a provider from the Databricks UI. This requires the same information to be provided to Databricks as the CLI and is similar to:
import requests
headers = {
'Authorization': f'Bearer {ACCESS_TOKEN}'
}
workspace = 'WORKSPACE_NAME'
endpoint = "api/2.1/unity-catalog/providers"
url = f"https://{workspace}.cloud.databricks.com/{endpoint}"
data = {
"name": "BRIDGE_NAME",
"authentication_type": "TOKEN",
"comment": "Amperity Bridge",
"recipient_profile_str": "path/to/config.share"
}
response = requests.post(url, headers=headers, json=data)
response.json()
Add catalog from share¶
A catalog is the first layer in a Unity Catalog namespace and is used to organize data assets within Databricks.
To add a schema to a catalog in Databricks
Log in to Databricks, and then open the Catalog Explorer. |
|
In the Catalog Explorer, expand Delta Sharing, and then select Shared with me. This will display the list of schemas to which you have access. |
|
From the list of schemas, select the schema you just created. Click the Create catalog button, and then in the Create a new catalog dialog add the catalog name. A catalog name should clearly identify that data tables are shared from Amperity. For example: “Amperity Socktown outbound share”. A catalog name cannot include a period, space, or forward slash. When finihsed, click Create. You must have CREATE CATALOG permissions An error message is returned when a user who attempts to add a schema to a catalog does not have CREATE CATALOG permissions to the Databricks metastore. This error is similar to: Requires permission CREATE CATALOG \
on Metastore '<metastore>'.
If you receive this error message:
|
Verify table sharing¶
Verify that the tables shared from Amperity are available from a catalog in Databricks.
To verify that tables were shared from Amperity to Databricks
From the Catalog Explorer in Databricks, expand Catalog, and then find the catalog that was created for sharing Amperity data. |
|
Open the catalog, and then verify that the tables you shared from Amperity are available in the catalog. |