Connect Databricks to Azure Blob Storage

Some organizations choose to store their data in Azure Blob Storage, but then use Databricks to enable data scientists, engineers, developers, and data analysts within their organization to use that data, along with a combination of Databricks SQL, R, Scala, and/or Python, to build models and tools that support external BI applications and domain-specific tools to help end-users consume that data through the interface they are most comfortable with.

You may send an Apache Parquet, Apache Avro, CSV, or JSON file from Amperity to Azure Blob Storage, and then connect to that data from Databricks.

What is Azure Blob Storage?

Azure Blob Storage is an object storage solution for the cloud that is optimized for storing massive amounts of unstructured data.

Add workflow

Amperity can be configured to send data to Azure Blob Storage, after which Databricks can be configured to connect to Azure Blob Storage and use the Amperity output as a data source.

Important

You must configure Amperity to send data to an Azure Blob Storage instance that your organization manages directly.

Connect Databricks to Azure Blob Storage.

To connect Databricks to Azure Blob Storage

The steps required to configure Amperity to send data that is accessible to Databricks from Azure Blob Storage requires completion of a series of short workflows, some of which must be done outside of Amperity.

Step 1.

Use a query to return the data you want to send to Databricks.

Step 2.

Send an Apache Parquet, Apache Avro, CSV, or JSON file to an Azure Blob Storage container from Amperity.

Step 3.

Connect Databricks to Azure Blob Storage , and then access the data sent from Amperity.

Step 4.

Validate the workflow within Amperity and the data within Databricks.

Step 5.

Configure Amperity to automate this workflow for a regular (daily) refresh of data.