Connect Databricks to Google BigQuery

Some organizations choose to store their data in Google BigQuery, but then use Databricks to enable data scientists, engineers, developers, and data analysts within their organization to use that data, along with a combination of Databricks SQL, R, Scala, and/or Python, to build models and tools that support external BI applications and domain-specific tools to help end-users consume that data through the interface they are most comfortable with.

You may send a CSV file from Amperity to Google BigQuery, and then connect to that data from Databricks.

What is Google BigQuery?

Google BigQuery is a fully-managed data warehouse that provides scalable, cost-effective, serverless software that can perform fast analysis over petabytes of data and querying using ANSI SQL.

Add workflow

Amperity can be configured to send data to Google Cloud Storage, after which Google BigQuery is configured to load that data from Google Cloud Storage. Databricks can be configured to connect to Google BigQuery and use the Amperity output as a data source.

Important

You must configure Amperity to send data to a Google Cloud Storage bucket that your organization manages directly.

Connect Databricks to Google BigQuery.

To connect Databricks to Google BigQuery

The steps required to configure Amperity to send data that is accessible to Databricks from Google BigQuery requires completion of a series of steps, some of which must be done outside of Amperity.

Step 1.

Use a query to return the data you want to send to Databricks.

Step 2.

Send a CSV file to Google Cloud Storage from Amperity.

Step 3.

Load CSV data from Cloud Storage to Google BigQuery.

Step 4.

Connect Databricks to Google BigQuery , and then access the data sent from Amperity.

Step 5.

Validate the workflow within Amperity and the data within Databricks.

Step 6.

Configure Amperity to automate this workflow for a regular (daily or weekly) refresh of data.