Launching Matillion Data Loader

As you might know I am managing the Data & Analytics unit at Pong. One of my responsibilities is setting the direction of this unit. One of our focus points is Modern Cloud Analytics. Therefor we have setup a few partnerships focussed on the Snowflake Eco-system. You can find out more about these partnerships; here. Unfortunately only in Dutch, but Google Translate might help a bit. Matillion is one of those partners from the Snowflake Eco-system. A few weeks ago I found out that Matillion is launching Matillion Data Loader (MDL); a free Software-as-a-Service (SaaS) Data Integration solution.

“Code-free, Wizard-based Pipeline builder”

With Matillion Data Loader you can:

  1. Easily and quickly load data into a Cloud Data Warehouse (Snowflake, Google Big Query and Amazon Redshift) using a Code-free, Wizard-based Pipeline builder,
  2. Using Scale & Speed capabilities in a Serverless environment,
  3. Monitoring jobs from a central dashboard,
  4. Upgrade to Matillion ETL for Advanced Data Transformation, Data Integration, and Job Orchestration.

I was one of the lucky ones to receive an invite to play around with Matillion Data Loader. Starting was relatively easy;

  1. Sign-up and activate my account,
  2. Start building Data Pipelines
  3. Start loading Data into Snowflake.

From Source to Cloud Data Warehouse in a few clicks

In this blog I will show you my first steps on the Matillion Data Loader path. I will use MDL to load data from an Excel-sheet into Snowflake (SF). The Excel-sheet is located in a Google Cloud Platform (GCP) bucket.

GCP Cloud Credentials

Matillion Data Loader must be able to select the Excel-sheet from the GCP-bucket. There for it is necessary to specify the Google Cloud Credentials. These can be downloaded from GCP. Follow the steps in the; Getting Started with Authentication and download the JSON-file to your local environment. In MDL you can manage the GCP Cloud Credentials and upload the JSON-file.

Google Cloud Credentials

Snowflake Destination Configuration

To be able to load data into Snowflake, you need to make sure that MDL can connect to Snowflake with the correct credentials with enough privileges (e.g. ‘CREATE VIEW’-privileges in SF).

Destination Warehouse Config

The two endpoints are in place to setup. There is a GCP-bucket where the Excel-file lives and where we have access to. At the other end, there is a Snowflake Database with a Schema where we are allowed to create a table and load data into. What’s left is creating an MDL Pipeline

Matillion Data Loader Pipeline

We need to create a Data Pipeline to get started. For the purpose of this blog, we choose an Excel-sheet as the Source. Snowflake will be the Target Destination Warehouse.

Matillion Data Loader Pipeline

Add the Pipeline

We select Snowflake as the Destination Warehouse. Therefore we can select the Destination Configuration we created earlier. The Pipeline gets a meaningful name.

Add Matilion Data Loader Pipeline

Google Cloud Storage

In a next step we need to provide the URI of the GCP-Bucket. Be carful. When you copy the URI it seems to include a space at the beginning. If you do not remove this space when you enter the Connection URL in Matillion Data Loader, you will get an error; “That URL is not a valid GS connection URL”. Copy the URI to the next step in the MDL Data Pipeline creation.

Google Cloud Storage

Google Cloud Connection Details

To be able to load the Excel-file from the Google Cloud Storage, we need to specify the Connection details. This means we select the Google Storage Type. After that we can select the GCP Credentials we created earlier. Finally we fill in the copied URI from the previous step. Make sure it has the right format and no spaces at the beginning.

GCP Connection Details

Data Sources

Without specifying Connection Options, we go straight to te selection of the Data Sources. First click the ‘+’-sign and then select the appropriate sheet. Make sure you give the sheet in Excel a meaningful name. This name will be used to name the tabel in Snowflake.

Data Pipeline Data Sources

Data Selection

In the following step, you select the columns you want to add to the table in Snowflake. By default all columns are selected. Also the empty ones.

Excel Data Selection

Snowflake Configuration

We arrived at the fifth step in the Data Pipeline setup. Here we are going to tell Snowflake which table we are going to create by specifying a prefix. Combined with the sheet name, this will be the Target Table name. On top of that we specify which Warehouse to use and in which Database Schema we are going to create the table and load the data.

Snowflake Configuration

Verify the results in Snowflake

If everything works out like planned the MDL Data Pipeline will run and load the data into Snowflake. We can head over there and verify the results. As we can see there is a STG_CHANNELS table in the DAANALYTICS-database in the STAGING-schema. The columns match the columns specified in the sheet. Also the data is exactly the same as in the sheet. Apart from to Matillion-specific metadata columns.

Snowflake result

Summary

Creating a Data Pipeline in Matillion is relatively easy. It needs some preparation at both ends. After that it is a matter of selecting and configuring. Most things speak for itself. One downside of MDL is that you cannot reconfigure a Data Pipeline. If something is not working like expected, you need to start all over again. Luckily enough creating Data Pipelines in Matillion Data Loader is not that hard.

Thanks for reading.

DaAnalytics

DaAnalytics

Hi, ik ben Daan Bakboord. Ik ben een ondernemende Data & Analytics consultant. Mijn passie is het verzamelen, verwerken, opslaan en presenteren van data. Ik help organisaties deze interne en externe data zodanig in te zetten, dat ze in actie kunnen komen. Door het inzetten van betrouwbare data kunnen ondernemers hun ‘gut-feeling’ onderbouwen met feiten.

One Comment

Leave a Reply

Do NOT follow this link or you will be banned from the site!