Loading BigQuery into Snowflake

Last week I wrote about the newly released Matillion Data Loader (MDL). Matillion Data Loader is a Code-free, Wizard-based Pipeline builder. My first MDL Data Pipeline was loading an Excel sheet into Snowflake. Creating Data Pipelines in Matillion Data Loader is relatively easy and straightforward. Time to try something else. This time I will be loading BigQuery (GBQ) into Snowflake. Again I will show how to go from Source to Cloud Data Warehouse in a few clicks.

Add Pipeline

First add the Google Big Query Pipeline to select data from GBQ.

Add a Google Big Query Pipeline in Matillion Data Loader

Destination

To be able to load data into Snowflake, you need to make sure that MDL can connect to Snowflake with the correct credentials and with enough privileges. Therefor you can re-use a Snowflake Destination Configuration if you created one earlier. I re-use the one I created for the previous blogpost.

In this case I can select the; ‘DaAnalytics Snowflake’ Destination Config. I give the Pipeline a meaningful (at least for me) name.

Add a Google Big Query Pipeline in Matillion Data Loader

Connection details

Google Big Query

If you want to load data from Google Big Query, then you should specify the following:

  • Google OAuth – As I wrote in my previous blogpost, you will need Google Cloud Platform Credentials to connect to Google BigQuery. Follow the steps in the; Getting Started with Authentication and download the JSON-file to your local environment. Manage the GCP Cloud Credentials in MDL and upload the JSON-file.
  • ProjectId – In this case; ‘daanalyticsproject’
  • DatasetId – In this case; ‘new_york_citibike’

Check Google Big Query details

Select the Google OAuth and fill in the GBQ-details (Project ID and Dataset ID).

Add Google Big Query details in Matillion Data Loader

Data Sources

Add Google Big Query datasource in Matillion Data Loader

By selecting DatasetId; ‘new_york_citibike’ I can select ‘citibike_stations’ and the columns in this Data Source.

Select Google Big Query datasource columns in Matillion Data Loader

Staging & Target Configuration

In this step I specify how the tables are named and created in Snowflake. Both a staging,- as well as a target table will be created. By specifying a prefix in combination with the Data Source Name, you will get the Snowflake Table name. On top of that we specify which Warehouse to use and in which Database Schema we are going to create the table and load the data.

Define Snowflake STG & TGT in Matillion Data Loader

Finish

Almost all steps are performed. In the last step I can check wether Matillion Data Loader is ready to retrieve rows from the citibike_stations table in Google Big Query. As you can see on the right clicking the ‘Test’-button resulted in a ‘Success’-message.

Finish a Google Big Query Pipeline in Matillion Data Loader

Verify the results in Snowflake

In MDL I received a message that 936 rows have been loaded into Snowflake. I can verify that in Snowflake. In Snowflake I can see that there are also 936 rows in the TGT_CITIBIKE_STATIONS-table.

Check results in Snowflake

Snowflake History

The Matillion Data Loader job performs a series of steps in Snowflake. These steps can be checked via the History-tab in Snowflake.

Check history in Snowflake

Summary

Just as in the previous blogpost creating a Data Pipeline in Matillion is relatively easy. Also for the Google Big Query Pipeline it needs some preparation at both ends. After that it is a matter of selecting and configuring. Most things speak for itself. One downside of MDL is that you cannot reconfigure a Data Pipeline. If something is not working like expected, you need to start all over again. Luckily enough creating Data Pipelines in Matillion Data Loader is not that hard.

Thanks for reading.

Get started with Matillion Data Loader: https://dataloader.matillion.com

DaAnalytics

DaAnalytics

Hi, ik ben Daan Bakboord. Ik ben een ondernemende Data & Analytics consultant. Mijn passie is het verzamelen, verwerken, opslaan en presenteren van data. Ik help organisaties deze interne en externe data zodanig in te zetten, dat ze in actie kunnen komen. Door het inzetten van betrouwbare data kunnen ondernemers hun ‘gut-feeling’ onderbouwen met feiten.

Leave a Reply

Do NOT follow this link or you will be banned from the site!