Configure bridge for Databricks

Delta Sharing is an open protocol for simple and secure sharing of live data between organizations. Delta Sharing generates temporary credentials that allow access to individual data files in cloud storage without copying data to another system and regardless of which computing platforms are used.

About encryption, credentials, and audit logs

Delta Sharing uses end-to-end TLS encryption from client to server to storage account along with short-lived credentials, such as pre-signed URLs, to access data.

Review Databricks documentation for security best practices , including setting token lifetimes for metastores, rotating credentials, applying granularity for shares and recipients, configuring IP access lists, and audit logging.

Audit logging occurs in Databricks and in Amperity.

  • Audit logging in Amperity shows each users actions and interactions along with access to a history of workflows with tasks that use a bridge to sync data between Amperity and Databricks.

  • Databricks captures Delta Sharing provider events , which includes logging for when a recipient (Amperity) accesses data.

From Databricks

A sync from Databricks to Amperity requires configuration steps to be made in both Amperity and Databricks.

  1. Get details

  2. Configure Databricks

  3. Configure subnet IDs (Microsoft Azure only)

  4. Add inbound bridge

Get details

Before you can create inbound sharing between Databricks and Amperity a recipient and share must be created in Databricks, after which tables are added to the share and access to the share is granted to the recipient.

The user who performs these actions may use the Databricks CLI or the Databricks Catalog Explorer and must be assigned the CREATE RECIPIENT, CREATE SHARE, USE CATALOG, USE SCHEMA, and SELECT permissions, along with the ability to grant the recipient access to the share.

Requirement 1.

The user who will create a recipient for sharing data from Databricks to Amperity must have CREATE CATALOG permissions in Databricks.

Note

If a Databricks notebook is used to create the recipient the cluster must use Databricks Runtime 11.3 LTS (or higher) and must be running in shared mode or single-cluster access mode.

Requirement 2.

The user who will create a share in the Unity Catalog metastore must have CREATE SHARE permissions in Databricks.

Requirement 3.

The user who will add tables to a share must:

  • Be a share owner; Databricks recommends to use a group as the share owner.

  • Have USE CATALOG and USE SCHEMA permissions on the catalog and schema in which the tables are located.

  • Have SELECT permissions to each table.

Requirement 4.

The user who grants the recipient access to the metastore must be one of the following:

  • A metastore administrator.

  • A user with delegated permissions or ownership on both the share and recipient objects.

    If the user created the recipient and share, they are the share owner and recipient owner.

    If the user did not create the recipient and share they will need USE SHARE and SET SHARE PERMISSION on the share and USE RECIPIENT on the recipient.

Requirement 5.

The IP address for Amperity may need to be added to an allowlist when subnet IDs are not configured in a Microsoft Azure environment.

Most connections are made directly to your Amperity tenant. Use one of the following Amperity IP addresses for an allowlist that is required by an upstream system. The specific IP address to use depends on the location in which your tenant is hosted:

  • On Amazon AWS use “52.42.237.53”

  • On Amazon AWS (Canada) use “3.98.199.97”

  • On Microsoft Azure use “104.46.106.84” and “20.81.91.210”

  • On Microsoft Azure (EU) use “20.123.127.54”

Requirement 6.

For bridges that connect to Databricks environments running in Microsoft Azure and are using storage account firewalls, the outbound subnet IDs for Amperity Bridge must be configured in Microsoft Azure using the Azure CLI.

Configure Databricks

To configure Databricks to sync data with Amperity you will need to CREATE SHARE and add tables to that share, CREATE RECIPIENT , grant the recipient access to the share , and then get an activation link . The activation link allows a user to download a credential file that is required to configure inbound sharing in Amperity.

Note

The following section briefly describes using the Databricks Catalog Explorer to configure Databricks to be ready to sync data with Amperity, along with links to Databricks documentation for each step. You may use the Databricks CLI if you prefer. Instructions for using the Databricks CLI are available from the linked pages.

To configure Databricks for inbound sharing to Amperity

Step 1.

A share is a securable object in Unity Catalog that can be configured to share tables with Amperity.

Open the Databricks Catalog Explorer. Under Delta Sharing, choose Shared by me, then select Share data, and then create a share .

After you have created the share you may add tables to the share . Click Add assets, and then select the tables to share.

Step 2.

A recipient in Databricks represents the entity that will consume shared data: Amperity. Configure the recipient for open sharing and to use token-based authentication.

Open the Databricks Catalog Explorer. Under Delta Sharing, choose Shared by me, and then click New recipient to create a recipient .

After the recipient is created, grant the recipient access to the share .

Step 3.

Open sharing uses token-based authentication.

The credentials file that contains the token is available from an activation link . Use a secure channel to share the activation link with the user who will download the credentials file, and then configure Amperity for inbound sharing.

Important

You can download the credential file only once. Recipients should treat the downloaded credential as a secret and must not share it outside of their organization. If you have concerns that a credential may have been handled insecurely, you can rotate credentials at any time.

Configure subnet IDs

For bridges that connect to Databricks environments running in Microsoft Azure and are using storage account firewalls, the outbound subnet IDs for Amperity Bridge must be configured in Microsoft Azure using the Azure CLI. This step is only required for Microsoft Azure storage accounts running in any of the following regions: az-prod East US 2, az-prod East US, or az-prod-en1 North Europe.

Important

The following command line examples use placeholders. Replace “myresourcegroup” and “mystorageaccount” to the names of the resource group and storage account that exists within your Microsoft Azure environment.

az-prod East US 2

az storage account network-rule add --subnet \
/subscriptions/e733fc0a-b51a-4e9d-b6bb-fffc216f4d87/ \
resourceGroups/prod/providers/Microsoft.Network/ \
virtualNetworks/prod/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"

az-prod East US

az storage account network-rule add --subnet \
/subscriptions/e733fc0a-b51a-4e9d-b6bb-fffc216f4d87/ \
resourceGroups/prod-compute-failover/providers/Microsoft.Network/ \
virtualNetworks/prod-compute-failover/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"

az-prod-en1 North Europe

az storage account network-rule add --subnet \
/subscriptions/0e2b72b5-de51-4c28-8ba3-355fc7db10b7/ \
resourceGroups/prod-en1/providers/Microsoft.Network/ \
virtualNetworks/vnet/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"

Add inbound bridge

Configure an inbound bridge to sync data from Databricks to Amperity.

To add an inbound bridge

Step 1.

Open the Sources page. Under Inbound shares click Add bridge.

Choose Databricks.

Add a bridge for a sync.

This opens the Add bridge dialog box.

Add a bridge for a sync.

Add a name and description for the bridge or select an existing bridge, and then click Confirm.

Step 2.

Connect the bridge to Databricks by uploading the credential file that was downloaded from the activation link . There are two ways to upload the credential file:

  1. Uploading the credentials as the second step when adding a bridge. Drop the file into the dialog box or browse to a location on your local machine.

  2. Choosing the Upload credential option from the Actions menu for a sync.

After the credential file is uploaded, click Continue.

Important

You can download the credential file only once. Recipients should treat the downloaded credential as a secret and must not share it outside of their organization. If you have concerns that a credential may have been handled insecurely, you can rotate credentials at any time.

When finished, click Continue. This will open the Select tables dialog box.

Step 3.

Use the Select tables dialog box to select any combination of schemas and tables to be synced to Amperity.

Select schemas and tables to be shared.

If you select a schema, all tables in that schema will be synced. Any new tables added later will need to be manually added to the sync.

When finished, click Next. This will open the Domain table mapping dialog box.

Step 4.

Map the tables that are synced from Databricks to domain tables in Amperity.

Map inbound synced tables to domain tables.

Tables that are synced with Amperity are added as domain tables.

  • The names of synced tables must be unique among all domain tables.

  • Primary keys are not assigned.

  • Semantic tags are not applied.

Tip

Use a custom domain table to assign primary keys, apply semantic tags, and shape data within synced tables to support any of your Amperity workflows.

When finished, click Save and sync. This will start a workflow that synchronizes data from Databricks to Amperity and will create the mapped domain table names.

You can manually sync tables with Amperity using the Sync option from the Actions menu for the bridge.

To Databricks

A sync from Amperity to Databricks requires configuration steps to be made in both Amperity and Databricks.

Tip

If you have already installed and configured the Databricks CLI and have permission to configure catalogs and providers in Databricks, the configuration process for outbound shares takes about 5 minutes.

  1. Get details

  2. Add bridge

  3. Select tables to share

  4. Download credential file

  5. Add provider

  6. Add catalog from share

  7. Verify table sharing

Get details

Before you can create outbound sharing between Amperity and Databricks you must have permission to create providers and catalogs in Databricks. You may create the provider from the Databricks user interface, using the Databricks CLI, or by using Python.

Requirement 1.

The user who will add the schema to a catalog in Databricks must have CREATE CATALOG permissions in Databricks.

Requirement 2.

A user who will run queries against tables in a schema must have SELECT permissions in Databricks. SELECT permissions may be granted on a specific table, on a schema, or on a catalog.

Requirement 3.

To use the Databricks CLI, it must be installed and configured on your workstation.

For new users …

If you have not already set up and configured the Databricks CLI you will need to do the following:

  1. Install the Databricks CLI .

  2. Get a personal access token .

  3. Configure the Databricks CLI for your local machine.

    Run the databricks configure command, after which you will be asked to enter the hostname for your instance of Databricks along with your personal access token.

The user who will run the Databricks CLI and add a schema to Databricks for outbound sharing from Amperity must have CREATE PROVIDER permissions in Databricks.

Add outbound bridge

Configure an outbound bridge to sync data from Amperity to Databricks.

To add an outbound bridge

Step 1.

Open the Destinations page. Under Outbound shares click Add bridge. This opens the Create bridge dialog box.

Step 2.

Add the name for the bridge and a description, and then set the duration for which the token will remain active.

Add a bridge for a sync.

Optional. You may restrict access to specific IPs or to a valid CIDR (for a range of IPs). Place separate entries on a new line. Expand Advanced Settings to restrict access.

When finished, click Create. This will open the Select tables dialog box, in which you will configure any combination of schemas and tables to share with Databricks.

Select tables to share

A shared dataset represents all databases and/or database tables that are configured for outbound sharing with another organization.

You can configure Amperity to share any combination of schemas and tables that are available from the Customer 360 page.

To select schemas and tables to share

Step 1.

After you have configured the settings for the bridge, click Next to open the Select tables dialog box.

Select schemas and tables to be shared.

You may select any combination of schemas and tables.

If you select a schema, all tables in that schema will be shared, including all changes made to all tables in that schema.

When finished, click Save. This will open the Download credential dialog box, from which you will download the credentials.share file that is required by the Databricks CLI when creating a catalog in Databricks.

Step 2.

When a bridge is already configured, you may edit the list of schemas and tables that are shared. From the Destinations page, under Outbound shares, open the Actions for a bridge, and then click Edit. This will open the Select tables dialog box.

Download credential file

There are two ways to download the credential file:

Step 1.

Click the Download credential button as part of the steps shown when you configure a bridge by clicking the Add bridge button located under Outbound shares on the Destinations page.

Step 2.

Choosing the Download credential option from the Actions menu for an outbound share.

Add provider

Databricks supports a variety of methods for adding a provider to a catalog. Use the method that works best for your organization:

Databricks UI

You can create a provider directly from the Databricks user interface. Upload the Amperity share credentials directly as part of this process.

Step 1.

Open the Databricks user interface. Open Catalog Explorer, then Delta Sharing, and then Shared with me.

Step 2.

At the bottom of the Shared with me page, click the Import provider directly button. This opens the Import Provider dialog.

Add a provider using the Databricks user interface.

Give the provider a name, and then upload the credential for the Amperity share.

Click Import. This opens the providers page.

Step 3.

On the providers page, click Create catalog to add a catalog for the data that is shared from Amperity.

Databricks CLI

You can use the Databricks CLI to create a provider in Databricks. Attach the credentials that were downloaded from Amperity to the schema as part of the command that creates the bridge between Amperity and the provider in Databricks.

Step 1.

Open the Databricks CLI in a command window.

Step 2.

Run the databricks providers create command:

$ databricks providers create socktown \
  TOKEN \
  -recipient-profile-str "$(< path/to/config.share)"

where TOKEN is your Databricks personal access token, socktown is the name of the provider, and “path/to/config.share” represents the path to the location into which the Amperity credentials file was downloaded.

Databricks CLI and Windows environments

If you are running the Databricks CLI using Powershell, the command is similar to:

$ databricks providers create socktown \
  TOKEN \
  --recipient-profile-str \
    (Get-Content -Raw path\to\config.share)

If you are running the Databricks CLI using CMD, the command is similar to:

setlocal enabledelayedexpansion ^
set "str=" ^
for /f "delims=" %a in (path\to\config.share) ^
do set "str=!str!%a" ^
databricks providers create socktown TOKEN ^
--recipient-profile-str "!str!" ^
endlocal
Step 3.

A successful response from Databricks is similar to:

{
  "authentication_type":"TOKEN",
  "created_at":1714696789105,
  "created_by":"user@socktown.com",
  "name":"socktown",
  "owner":"user@socktown.com",
  "recipient_profile": {
    "endpoint":"URL for Amperity bridge endpoint",
    "share_credentials_version":1
  },
  "updated_at":1714696789105,
  "updated_by":"user@socktown.com"
}

You must have CREATE PROVIDER permissions

An error message is returned when a user who runs the databricks providers create command does not have CREATE PROVIDER permissions to the Databricks metastore.

This error is similar to:

Error: User does not have CREATE PROVIDER \
on Metastore '<metastore>'.

If you receive this error message:

  1. Ask your Databricks administrator to assign to your Databricks user account the CREATE PROVIDER permission.

  2. Rerun the databricks providers create command.

Python

You can use Python to create a provider from the Databricks UI. This requires the same information to be provided to Databricks as the CLI and is similar to:

import requests

headers = {
  'Authorization': f'Bearer {ACCESS_TOKEN}'
}
workspace = 'WORKSPACE_NAME'
endpoint = "api/2.1/unity-catalog/providers"
url = f"https://{workspace}.cloud.databricks.com/{endpoint}"

data = {
  "name": "BRIDGE_NAME",
  "authentication_type": "TOKEN",
  "comment": "Amperity Bridge",
  "recipient_profile_str": "path/to/config.share"
}

response = requests.post(url, headers=headers, json=data)
response.json()

Add catalog from share

A catalog is the first layer in a Unity Catalog namespace and is used to organize data assets within Databricks.

To add a schema to a catalog in Databricks

Step 1.

Log in to Databricks, and then open the Catalog Explorer.

Step 2.

In the Catalog Explorer, expand Delta Sharing, and then select Shared with me.

This will display the list of schemas to which you have access.

Step 3.

From the list of schemas, select the schema you just created.

Click the Create catalog button, and then in the Create a new catalog dialog add the catalog name. A catalog name should clearly identify that data tables are shared from Amperity. For example: “Amperity Socktown outbound share”. A catalog name cannot include a period, space, or forward slash. When finihsed, click Create.

You must have CREATE CATALOG permissions

An error message is returned when a user who attempts to add a schema to a catalog does not have CREATE CATALOG permissions to the Databricks metastore.

This error is similar to:

Requires permission CREATE CATALOG \
on Metastore '<metastore>'.

If you receive this error message:

  1. Ask your Databricks administrator to assign to your Databricks user account the CREATE CATALOG permission.

  2. Click the Create catalog button and retry adding the schema to the catalog.

Verify table sharing

Verify that the tables shared from Amperity are available from a catalog in Databricks.

To verify that tables were shared from Amperity to Databricks

Step 1.

From the Catalog Explorer in Databricks, expand Catalog, and then find the catalog that was created for sharing Amperity data.

Step 2.

Open the catalog, and then verify that the tables you shared from Amperity are available in the catalog.

Amperity data in a Databricks Unity Catalog.