Loading BigQuery into Snowflake
Last week I wrote about the newly released Matillion Data Loader (MDL). Matillion Data Loader is a Code-free, Wizard-based Pipeline builder. My first MDL Data Pipeline was loading an Excel sheet into Snowflake. Creating Data Pipelines in Matillion Data Loader is relatively easy and straightforward. Time to try something else. This time I will be loading BigQuery (GBQ) into Snowflake. Again I will show how to go from Source to Cloud Data Warehouse in a few clicks.
First add the Google Big Query Pipeline to select data from GBQ.
To be able to load data into Snowflake, you need to make sure that MDL can connect to Snowflake with the correct credentials and with enough privileges. Therefor you can re-use a Snowflake Destination Configuration if you created one earlier. I re-use the one I created for the previous blogpost.
In this case I can select the; ‘DaAnalytics Snowflake’ Destination Config. I give the Pipeline a meaningful (at least for me) name.
Google Big Query
If you want to load data from Google Big Query, then you should specify the following:
- Google OAuth – As I wrote in my previous blogpost, you will need Google Cloud Platform Credentials to connect to Google BigQuery. Follow the steps in the; Getting Started with Authentication and download the JSON-file to your local environment. Manage the GCP Cloud Credentials in MDL and upload the JSON-file.
- ProjectId – In this case; ‘daanalyticsproject’
- DatasetId – In this case; ‘new_york_citibike’
Select the Google OAuth and fill in the GBQ-details (Project ID and Dataset ID).
By selecting DatasetId; ‘new_york_citibike’ I can select ‘citibike_stations’ and the columns in this Data Source.
Staging & Target Configuration
In this step I specify how the tables are named and created in Snowflake. Both a staging,- as well as a target table will be created. By specifying a prefix in combination with the Data Source Name, you will get the Snowflake Table name. On top of that we specify which Warehouse to use and in which Database Schema we are going to create the table and load the data.
Almost all steps are performed. In the last step I can check wether Matillion Data Loader is ready to retrieve rows from the citibike_stations table in Google Big Query. As you can see on the right clicking the ‘Test’-button resulted in a ‘Success’-message.
Verify the results in Snowflake
In MDL I received a message that 936 rows have been loaded into Snowflake. I can verify that in Snowflake. In Snowflake I can see that there are also 936 rows in the TGT_CITIBIKE_STATIONS-table.
The Matillion Data Loader job performs a series of steps in Snowflake. These steps can be checked via the History-tab in Snowflake.
Just as in the previous blogpost creating a Data Pipeline in Matillion is relatively easy. Also for the Google Big Query Pipeline it needs some preparation at both ends. After that it is a matter of selecting and configuring. Most things speak for itself. One downside of MDL is that you cannot reconfigure a Data Pipeline. If something is not working like expected, you need to start all over again. Luckily enough creating Data Pipelines in Matillion Data Loader is not that hard.
Thanks for reading.
Get started with Matillion Data Loader: https://dataloader.matillion.com