Send query results to Databricks

Amperity can send the results of a query or the database export to Databricks. The the results of a query or the database export is uploaded to Databricks using a COPY INTO command, after which the the results of a query or the database export is added into your Databricks data warehouse.

Send query results and database exports from Amperity to Databricks.

This topic describes the steps that are required to send the results of a query or the database export to Databricks from Amperity:

  1. Build a query

  2. Add orchestration

  3. Run orchestration

  4. Access tables in Databricks

Note

Databricks must be enabled before you can configure an orchestration to send query results. Ask your DataGrid Operator or Amperity representative to enable Databricks for your tenant.

Build query or database export

You will need to build a query that outputs the data that you want to make available in Databricks or configure a database export that has one (or more) tables.

Databricks does not have specific schema requirements, other than requiring that the data sent from Amperity matches the schema that is defined in Databricks.

Amperity will create the table in Databricks if it does not exist.

When a table does exist, Amperity will verify the schema in Databricks, compare that to the schema in the NDJSON file that was sent from Amperity to cloud storage, and then will run a COPY INTO operation for columns that have matching column names and matching data types. Columns that do not match exactly will be ignored.

Amperity maps to the following data types in Databricks:

Amperity data type

Databricks data type

Boolean

BOOLEAN

Date

DATE

Datetime

TIMESTAMP

Decimal

DECIMAL(38,3)

Float

FLOAT

Integer

INT

String

STRING

Add orchestration

An orchestration defines the relationship between query results and a destination, including the location to which those query results will be sent and the frequency at which the orchestration will be run.

To add an orchestration

  1. From the Destinations tab, click Add Orchestration. This opens the Add Orchestration dialog box.

  2. From the Object Type drop-down, select Query.

  3. From the Object drop-down, select the query for which results will be sent to Databricks.

  4. From the Destination drop-down, select a destination that is configured for sending data to Databricks.

  5. From the Data Template drop-down, select a data template.

  6. Verify all settings.

  7. Set the workflow to Manual. (You can change this to automatic later, after verifying the end-to-end workflow.)

  8. Click Save.

Run orchestration

Run the orchestration manually to validate that it works.

To run the orchestration

  1. From the Destinations tab, under Orchestrations, open the    menu for the Databricks orchestration, and then select Run.

  2. The Status column for the orchestration will update to say “Waiting to start…”, after which the notifications pane will update to include a notification that shows the current status.

  3. When the orchestration has run successfully, the status is updated to “Completed”.

Access tables

After your data has been loaded to Databricks, you can access it and use it for with your workflows and use cases. For example:

  1. Use Databricks SQL to run ad-hoc queries and create dashboards on data stored in your table from within Databricks.

  2. Use a JDBC connection to access this data from external data visualization tools, such as Tableau, Domo, and Looker.