Configure bridge for Databricks¶
Delta Sharing is an open protocol for simple and secure sharing of live data between organizations. Delta Sharing generates temporary credentials that allow access to individual data files in cloud storage without copying data to another system and regardless of which computing platforms are used.
About encryption, credentials, and audit logs
Delta Sharing uses end-to-end TLS encryption from client to server to storage account along with short-lived credentials, such as pre-signed URLs, to access data.
Review Databricks documentation for security best practices , including setting token lifetimes for metastores, rotating credentials, applying granularity for shares and recipients, configuring IP access lists, and audit logging.
Audit logging occurs in Databricks and in Amperity.
Audit logging in Amperity shows each users actions and interactions along with access to a history of workflows with tasks that use a bridge to sync data between Amperity and Databricks.
Databricks captures Delta Sharing provider events , which includes logging for when a recipient (Amperity) accesses data.
From Databricks¶
A sync from Databricks to Amperity requires configuration steps to be made in both Amperity and Databricks.
Configure subnet IDs (Microsoft Azure only)
Get details¶
Before you can create inbound sharing between Databricks and Amperity a recipient and share must be created in Databricks, after which tables are added to the share and access to the share is granted to the recipient.
The user who performs these actions may use the Databricks CLI or the Databricks Catalog Explorer and must be assigned the CREATE RECIPIENT, CREATE SHARE, USE CATALOG, USE SCHEMA, and SELECT permissions, along with the ability to grant the recipient access to the share.
The user who will create a recipient for sharing data from Databricks to Amperity must have The user who will create a recipient for sharing data from Databricks to Amperity must have CREATE CATALOG permissions in Databricks. permissions in Databricks. Note If a Databricks notebook is used to create the recipient the cluster must use Databricks Runtime 11.3 LTS (or higher) and must be running in shared mode or single-cluster access mode. |
|
The user who will create a share in the Unity Catalog metastore must have CREATE SHARE permissions in Databricks. |
|
The user who will add tables to a share must:
|
|
The user who grants the recipient access to the metastore must be one of the following:
|
|
The IP address for Amperity may need to be added to an allowlist when subnet IDs are not configured in a Microsoft Azure environment. Most connections are made directly to your Amperity tenant. Use one of the following Amperity IP addresses for an allowlist that is required by an upstream system. The specific IP address to use depends on the location in which your tenant is hosted:
|
|
For bridges that connect to Databricks environments running in Microsoft Azure and are using storage account firewalls, the outbound subnet IDs for Amperity Bridge must be configured in Microsoft Azure using the Azure CLI. |
Configure Databricks¶
To configure Databricks to sync data with Amperity you will need to CREATE SHARE and add tables to that share, CREATE RECIPIENT , grant the recipient access to the share , and then get an activation link . The activation link allows a user to download a credential file that is required to configure inbound sharing in Amperity.
Note
The following section briefly describes using the Databricks Catalog Explorer to configure Databricks to be ready to sync data with Amperity, along with links to Databricks documentation for each step. You may use the Databricks CLI if you prefer. Instructions for using the Databricks CLI are available from the linked pages.
To configure Databricks for inbound sharing to Amperity
A share is a securable object in Unity Catalog that can be configured to share tables with Amperity. Open the Databricks Catalog Explorer. Under Delta Sharing, choose Shared by me, then select Share data, and then create a share . After you have created the share you may add tables to the share . Click Add assets, and then select the tables to share. |
|
A recipient in Databricks represents the entity that will consume shared data: Amperity. Configure the recipient for open sharing and to use token-based authentication. Open the Databricks Catalog Explorer. Under Delta Sharing, choose Shared by me, and then click New recipient to create a recipient . After the recipient is created, grant the recipient access to the share . |
|
Open sharing uses token-based authentication. The credentials file that contains the token is available from an activation link . Use a secure channel to share the activation link with the user who will download the credentials file, and then configure Amperity for inbound sharing. Important You can download the credential file only once. Recipients should treat the downloaded credential as a secret and must not share it outside of their organization. If you have concerns that a credential may have been handled insecurely, you can rotate credentials at any time. |
Configure subnet IDs¶
For bridges that connect to Databricks environments running in Microsoft Azure and are using storage account firewalls, the outbound subnet IDs for Amperity Bridge must be configured in Microsoft Azure using the Azure CLI. This step is only required for Microsoft Azure storage accounts running in any of the following regions: az-prod East US 2, az-prod East US, or az-prod-en1 North Europe.
Important
The following command line examples use placeholders. Replace “myresourcegroup” and “mystorageaccount” to the names of the resource group and storage account that exists within your Microsoft Azure environment.
az-prod East US 2
az storage account network-rule add --subnet \
/subscriptions/e733fc0a-b51a-4e9d-b6bb-fffc216f4d87/ \
resourceGroups/prod/providers/Microsoft.Network/ \
virtualNetworks/prod/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"
az-prod East US
az storage account network-rule add --subnet \
/subscriptions/e733fc0a-b51a-4e9d-b6bb-fffc216f4d87/ \
resourceGroups/prod-compute-failover/providers/Microsoft.Network/ \
virtualNetworks/prod-compute-failover/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"
az-prod-en1 North Europe
az storage account network-rule add --subnet \
/subscriptions/0e2b72b5-de51-4c28-8ba3-355fc7db10b7/ \
resourceGroups/prod-en1/providers/Microsoft.Network/ \
virtualNetworks/vnet/subnets/compute-spark-outbound \
--resource-group "myresourcegroup" \
--account-name "mystorageaccount"
Add inbound bridge¶
Configure an inbound bridge to sync data from Databricks to Amperity.
To add an inbound bridge
Open the Sources page. Under Inbound shares click Add bridge. Choose Databricks. This opens the Add bridge dialog box. Add a name and description for the bridge or select an existing bridge, and then click Confirm. |
|
Connect the bridge to Databricks by uploading the credential file that was downloaded from the activation link . There are two ways to upload the credential file:
After the credential file is uploaded, click Continue. Important You can download the credential file only once. Recipients should treat the downloaded credential as a secret and must not share it outside of their organization. If you have concerns that a credential may have been handled insecurely, you can rotate credentials at any time. When finished, click Continue. This will open the Select tables dialog box. |
|
Use the Select tables dialog box to select any combination of schemas and tables to be synced to Amperity. If you select a schema, all tables in that schema will be synced. Any new tables added later will need to be manually added to the sync. When finished, click Next. This will open the Domain table mapping dialog box. |
|
Map the tables that are synced from Databricks to domain tables in Amperity. Tables that are synced with Amperity are added as domain tables.
Tip Use a custom domain table to assign primary keys, apply semantic tags, and shape data within synced tables to support any of your Amperity workflows. When finished, click Save and sync. This will start a workflow that synchronizes data from Databricks to Amperity and will create the mapped domain table names. You can manually sync tables with Amperity using the Sync option from the Actions menu for the bridge. |
To Databricks¶
A sync from Amperity to Databricks requires configuration steps to be made in both Amperity and Databricks.
Tip
If you have already installed and configured the Databricks CLI and have permission to configure catalogs and providers in Databricks, the configuration process for outbound shares takes about 5 minutes.
Get details¶
Before you can create outbound sharing between Amperity and Databricks you must have permission to create providers and catalogs in Databricks. You may create the provider from the Databricks user interface, using the Databricks CLI, or by using Python.
The user who will add the schema to a catalog in Databricks must have CREATE CATALOG permissions in Databricks. |
|
A user who will run queries against tables in a schema must have SELECT permissions in Databricks. SELECT permissions may be granted on a specific table, on a schema, or on a catalog. |
|
To use the Databricks CLI, it must be installed and configured on your workstation. For new users … If you have not already set up and configured the Databricks CLI you will need to do the following:
The user who will run the Databricks CLI and add a schema to Databricks for outbound sharing from Amperity must have CREATE PROVIDER permissions in Databricks. |
Add outbound bridge¶
Configure an outbound bridge to sync data from Amperity to Databricks.
To add an outbound bridge
Open the Destinations page. Under Outbound shares click Add bridge. This opens the Create bridge dialog box. |
|
Add the name for the bridge and a description, and then set the duration for which the token will remain active. Optional. You may restrict access to specific IPs or to a valid CIDR (for a range of IPs). Place separate entries on a new line. Expand Advanced Settings to restrict access. When finished, click Create. This will open the Select tables dialog box, in which you will configure any combination of schemas and tables to share with Databricks. |
Download credential file¶
There are two ways to download the credential file:
Click the Download credential button as part of the steps shown when you configure a bridge by clicking the Add bridge button located under Outbound shares on the Destinations page. |
|
Choosing the Download credential option from the Actions menu for an outbound share. |
Add provider¶
Databricks supports a variety of methods for adding a provider to a catalog. Use the method that works best for your organization:
Databricks UI¶
You can create a provider directly from the Databricks user interface. Upload the Amperity share credentials directly as part of this process.
Open the Databricks user interface. Open Catalog Explorer, then Delta Sharing, and then Shared with me. |
|
At the bottom of the Shared with me page, click the Import provider directly button. This opens the Import Provider dialog. Give the provider a name, and then upload the credential for the Amperity share. Click Import. This opens the providers page. |
|
On the providers page, click Create catalog to add a catalog for the data that is shared from Amperity. |
Databricks CLI¶
You can use the Databricks CLI to create a provider in Databricks. Attach the credentials that were downloaded from Amperity to the schema as part of the command that creates the bridge between Amperity and the provider in Databricks.
Open the Databricks CLI in a command window. |
|
Run the databricks providers create command: $ databricks providers create socktown \
TOKEN \
-recipient-profile-str "$(< path/to/config.share)"
where TOKEN is your Databricks personal access token, socktown is the name of the provider, and “path/to/config.share” represents the path to the location into which the Amperity credentials file was downloaded. Databricks CLI and Windows environments If you are running the Databricks CLI using Powershell, the command is similar to: $ databricks providers create socktown \
TOKEN \
--recipient-profile-str \
(Get-Content -Raw path\to\config.share)
If you are running the Databricks CLI using CMD, the command is similar to: setlocal enabledelayedexpansion ^
set "str=" ^
for /f "delims=" %a in (path\to\config.share) ^
do set "str=!str!%a" ^
databricks providers create socktown TOKEN ^
--recipient-profile-str "!str!" ^
endlocal
|
|
A successful response from Databricks is similar to: {
"authentication_type":"TOKEN",
"created_at":1714696789105,
"created_by":"user@socktown.com",
"name":"socktown",
"owner":"user@socktown.com",
"recipient_profile": {
"endpoint":"URL for Amperity bridge endpoint",
"share_credentials_version":1
},
"updated_at":1714696789105,
"updated_by":"user@socktown.com"
}
|
Python¶
You can use Python to create a provider from the Databricks UI. This requires the same information to be provided to Databricks as the CLI and is similar to:
import requests
headers = {
'Authorization': f'Bearer {ACCESS_TOKEN}'
}
workspace = 'WORKSPACE_NAME'
endpoint = "api/2.1/unity-catalog/providers"
url = f"https://{workspace}.cloud.databricks.com/{endpoint}"
data = {
"name": "BRIDGE_NAME",
"authentication_type": "TOKEN",
"comment": "Amperity Bridge",
"recipient_profile_str": "path/to/config.share"
}
response = requests.post(url, headers=headers, json=data)
response.json()
Verify table sharing¶
Verify that the tables shared from Amperity are available from a catalog in Databricks.
To verify that tables were shared from Amperity to Databricks
From the Catalog Explorer in Databricks, expand Catalog, and then find the catalog that was created for sharing Amperity data. |
|
Open the catalog, and then verify that the tables you shared from Amperity are available in the catalog. |