Send data to Databricks Delta table

A Delta table is a table in a Delta Lake, which is an optimized storage layer that provides the foundation for storing data and tables in the Databricks Lakehouse Platform. Delta Lake is the default storage format for all operations on Databricks. Unless otherwise specified, all tables on Databricks are Delta tables.

Use data in a Databricks Delta table to build advanced round-trip SQL queries and to build models against your data. Access this data using Databricks SQL or by connecting external data visualization tools such as Tableau and Domo using a JDBC connection.

This topic describes the steps that are required to send the results of a query or the database export to Databricks Delta table from Amperity:

  1. Get details

  2. Add destination

  3. Add data template

How this destination works

Amperity can send the results of a query or a database export to Databricks. The results of a query or a database export are uploaded to cloud storage as NDJSON files, then are loaded to Databricks from cloud storage using a COPY INTO command, after which that data is added into your Databricks data warehouse as a Delta table.

Send the results of a query or database export from Amperity to Databricks.

A Databricks destination works like this:

  1. Use a query to return the data you want to send to Databricks or configure a database export.

    Note

    Amperity maps to the following data types in Databricks Delta table:

    Amperity data type

    Databricks data type

    Boolean

    BOOLEAN

    Date

    DATE

    Datetime

    TIMESTAMP

    Decimal

    DECIMAL(38,3)

    Float

    FLOAT

    Integer

    INT

    String

    STRING

  2. Configure the Databricks destination in Amperity.

    Note

    The results of a query or the contents of a database export are sent as NDJSON files to the cloud storage that is included with your tenant, either Amazon S3 or Azure Blob Storage. This step happens automatically and does not require any configuration.

    This destination then uses token-based JDBC authentication to run a COPY INTO operation that copies the contents of the NDJSON file to Databricks, after which that data is available from a Delta table in Databricks.

  3. Send a test from Amperity, and then from within Databricks verify that data is available and that you can run SQL queries against it.

    Important

    Amperity will create the table in Databricks Delta table if it does not exist.

    When a table does exist, Amperity will verify the schema in Databricks Delta table, compare that to the schema in the NDJSON file that was sent from Amperity to cloud storage, and then will run a COPY INTO operation for columns that have matching column names and matching data types. Columns that do not match exactly will be ignored.

  4. Use Databricks SQL to run ad-hoc queries and create dashboards on data stored in your table from within Databricks. Use a JDBC connection to access this data from external data visualization tools, such as Tableau, Domo, or Looker.

  5. Configure Amperity to automate this workflow for a regular (daily or weekly) refresh of data.

Get details

Databricks Delta table requires the following configuration details:

  1. The Server Hostname for your Databricks warehouse. For example: “acme.cloud.databricks.com”.

  2. The HTTP Path to your data warehouse. For example: “/sql/1.0/warehouses/1abc2d3456e7f890a”.

  3. A Personal Access Token to allow access to Databricks. A personal access token is generated string. For example: “dapi1234567890b2cd34ef5a67bc8de90fa12b”.

    Important

    Databricks recommends using a pesonal access token that belongs to service principals instead of workspace users.

You can find your Databricks connections details from the Databricks user interface.

Add destination

Configure Amperity to send data to Databricks Delta table using the COPY INTO operation.

To add a destination

  1. From the Destinations tab, click Add Destination. This opens the Add Destination dialog box.

  2. Enter the name of the destination and a description. For example: “Databricks Delta table” and “This sends the results of a query or the database export to Databricks Delta table”.

  3. From the Plugin drop-down, select Databricks Delta table.

  4. The “databricks” credential type is selected automatically.

  5. From the Credential drop-down, select Create a new credential. This opens the Create New Credential dialog box.

  6. Enter a name for the credential along with server hostname, HTTP path, and personal access token. Click Save.

  7. Select Allow customers to use this destination.

  8. Select Allow orchestrations from users with limited PII access. (A user with limited PII access has been assigned the Restrict PII Access policy option.)

  9. Click Save.

Add data template

A data template defines how columns in Amperity data structures are sent to downstream workflows. A data template is part of the configuration for sending query and segment results from Amperity to an external location.

You can configure Amperity to send query results or a database export to Databricks. These results are sent using an orchestration and will include all columns that were specified in the query.

To add a data template for orchestrations

  1. From the Destinations tab, open the menu for a destination that is configured for Databricks Delta table, and then select Add data template.

    This opens the Add Data Template dialog box.

  2. Enter the name of the data template and a description. For example: “Databricks Delta table customer profiles” and “Send email addresses and customer profiles to Databricks Delta table.”

  3. Enable the Allow customers to use this data template option. This allows users to build queries, and then configure orchestrations that send results from Amperity to a configured destination.

  4. Optional. Enable the Allow orchestrations from customers with limited PII access option. This allows users who have been assigned the Restrict PII Access policy option to send results from Amperity.

  5. Verify all template settings and make any required updates.

  6. Click Save.