Send data to Databricks Delta table¶
A Delta table is a table in a Delta Lake, which is an optimized storage layer that provides the foundation for storing data and tables in the Databricks Lakehouse Platform. Delta Lake is the default storage format for all operations on Databricks. Unless otherwise specified, all tables on Databricks are Delta tables.
Use data in a Databricks Delta table Delta table to build advanced round-trip SQL queries and to build models against your data. Access this data using Databricks Delta table SQL or by connecting external data visualization tools such as Tableau and Domo using a JDBC connection.
This topic describes the steps that are required to send the results of a query or the database export to Databricks Delta table from Amperity:
How this destination works¶
Amperity can send the results of a query or a database export to Databricks Delta table. The results of a query or a database export are uploaded to cloud storage as NDJSON files, then are loaded to Databricks Delta table from cloud storage using a COPY INTO operation, after which that data is added into your Databricks Delta table data warehouse as a Delta table.
Important
COPY_OPTIONS is a parameter for the COPY INTO operation. Amperity sets the value for COPY_OPTIONS to 'force' = 'true'
, which means that files are loaded regardless of whether they’ve been loaded before.
A Databricks Delta table destination works like this:
Use a query to return the data you want to send to Databricks Delta table or configure a database export.
Note
Amperity maps to the following data types in Databricks Delta table:
Amperity data type
Databricks data type
Boolean
BOOLEAN
Date
DATE
Datetime
TIMESTAMP
Decimal
DECIMAL(38,3)
Float
FLOAT
Integer
INT
String
STRING
Configure the Databricks destination in Amperity.
The results of a query or the contents of a database export are sent as NDJSON files to the cloud storage that is included with your tenant, either Amazon S3 or Azure Blob Storage. This step happens automatically and does not require any configuration.
Note
This destination uses token-based JDBC authentication to run a COPY INTO operation that copies the contents of the NDJSON file to Databricks Delta table, after which that data is available from a Delta table in Databricks Delta table.
COPY_OPTIONS is a parameter for the COPY INTO operation. Amperity sets the value for COPY_OPTIONS to
'force' = 'true'
, which means that files are loaded regardless of whether they’ve been loaded before.Send a test from Amperity, and then from within Databricks Delta table verify that data is available and that you can run SQL queries against it.
Important
Amperity will create the table in Databricks Delta table if it does not exist.
When a table does exist, Amperity will verify the schema in Databricks Delta table, compare that to the schema in the NDJSON file that was sent from Amperity to cloud storage, and then will run a COPY INTO operation for columns that have matching column names and matching data types. Columns that do not match exactly will be ignored.
Use Databricks SQL to run ad-hoc queries and create dashboards on data stored in your table from within Databricks Delta table. Use a JDBC connection to access this data from external data visualization tools, such as Tableau, Domo, or Looker.
Configure Amperity to automate this workflow for a regular (daily or weekly) refresh of data.
Get details¶
Databricks Delta table requires the following configuration details:
The Server Hostname for your Databricks Delta table data warehouse. For example: “acme.cloud.databricks.com”.
The HTTP Path to your data warehouse. For example: “/sql/1.0/warehouses/1abc2d3456e7f890a”.
A Personal Access Token to allow access to Databricks Delta table. A personal access token is a generated string. For example: “dapi1234567890b2cd34ef5a67bc8de90fa12b”.
Important
Databricks Delta table recommends using a personal access token that belongs to service principals instead of workspace users.
You can find your Databricks connections details from the Databricks Delta table workspace.
Add destination¶
Configure Amperity to send the results of a query or the database export directly to Databricks Delta table.
To add a destination
Open the Destinations tab to configure a destination for Databricks Delta table. Click the Add Destination button to open the Destination dialog box. Enter a name for the destination and provide a description. For example: “Databricks Delta table” and “This sends the results of a query or the database export to Databricks Delta table”. From the Plugin drop-down, start typing “data” to filter the list, and then select Databricks Delta table. |
|
Credentials allow Amperity to connect to Databricks Delta table. The credential type is set automatically. You may use an existing credential or you may add a new one. Select an existing credential from the Credential drop-down. – or – Select Create a new credential from the Credential drop-down. This opens the Credential dialog box. Enter the name for the credential, and then add a description. Databricks Delta table has the following settings:
When finished, click Save. |
|
Each destination has settings that define how Amperity will deliver data to Databricks Delta table. These settings are listed under the Settings section of the Destination dialog box. Enter the Catalog name that will be updated by Amperity. (This value may be specified from the data template.) |
|
Business users are assigned to the Amp360 User and/or AmpIQ User policies. (Amp360 User allows access to queries and orchestrations and AmpIQ User allows access to segments and campaigns.) A business user cannot select a destination that is not visible to them. Business users – including users assigned to the DataGrid Operator policy – may have restricted access to PII. What is restricted access to PII? Restricted PII access is enabled when the Restrict PII access policy option that prevents users who are assigned to that option from viewing data that is marked as PII anywhere in Amperity and from sending that data to any downstream workflow. You can make this destination visible to orchestrations and allow users with restricted access to PII to use this destination by enabling one (or both) of the following options: |
|
Review all settings, and then click Save. Important You must configure a data template for this destination before you can send data to Databricks Delta table. |
Add data template¶
A data template defines how columns in Amperity data structures are sent to downstream workflows. A data template is part of the configuration for sending query and segment results from Amperity to an external location.
To add a data template
From the Destinations tab, open the menu for a destination that is configured for Databricks Delta table, and then select Add data template. This opens the Add Data Template dialog box. From the Destinations tab, open the menu for a destination that is configured for Databricks Delta table, and then select Add data template. This opens the Add Data Template dialog box. |
|
Verify business user access to queries and orchestrations and access to segments and campaigns. A business user may also have restricted access to PII, which prevents them from viewing and sending customer profile data. If business user access was not configured as part of the destination, you may configure access from the data template. |
|
Verify all configuration settings. Enter the Catalog name that will be updated by Amperity. (This value may be specified from the destination.) |
|
Under Template Settings, enter the catalog, database and table name. Caution The database name must be a valid database. A table will be created if one does not exist. Catalog name is optional.. |
|
Review all settings, and then click Save. After you have saved the data template, and depending on how you configured it, business users can send query results or a database export to Databricks to Databricks Delta table. |