AWS S3 to Snowflake

In the serie; “Third-Partying with Snowflake” we will go and load some data into Snowflake. There are various ways to do that. This blog will cover the selecting data in a AWS S3 bucket and copy these into Snowflake. Check out the Snowflake docs on how to to this. I extended that doc with some my own experiences (and struggles). This will be a step-by-step procedure. You might follow along.

Loading data into AWS S3 Bucket

First we have to load some data into AWS S3. Find some sample data here on Github. Follow the ASW docs to see how to upload data into a S3 Bucket.

Setting up Snowflake to receive the data

Make sure you have an appropriate Database and Schema in place in Snowflake. These objects need the necessary privileges to load the data. Find a script to create the tables and constraints here on Github. Make sure you also have a Snowflake File Format to copy the data. Either via the WebUI or via SQL.

Setup AWS External Stage

Create an External Stage in Snowflake to use to copy the data in the final Snowflake table.

I followed the steps in this blogpost to get and enable the AWS S3 Bucket url.

To execute the above statement, you’ll also need two keys:

  • AWS_KEY_ID
  • AWS_SECRET_KEY

Follow the below steps to retrieve these keys from your AWS instance.

  1. Log in to the AWS Management Console.
  2. Firstly, select your user name at the top right of the page.
  3. Secondly, select the Security Credentials link from the drop-down menu.
  4. Thirdly, select the Access Credentials section, and select the Access Key ID (AWS_KEY_ID).
  5. Click on the Show link in the same row, and select the Secret Access Key (AWS_SECRET_KEY).

Query the stage and verify the AWS S3 Bucket files are there.

Copy data from Stage into Snowflake

Now everything is set to copy the data from stage into a Snowflake table. The command to do so is something similar to the below:

In this particular example, with the dataset I was using, it didn’t go all smoothly. Therefore I had to make some adjustments. Check out the Snowflake docs; ‘Transforming Data During a Load’ for more details. This could lead to the below statement:

Find a script to copy this examples data from stage into Snowflake here on Github.

Validate Data

Do a quick check on the tables to see that the above has worked out for you.

In a few steps I was able to load data from AWS S3 to Snowflake. I will use the data to continue the “Third-Partying with Snowflake”-series and check out the SqlDBM partnership with Snowflake

Thanks for reading. If there is anything regarding this post I can help you out with, please let me know. I will be happy to help you out.

Cheers,

Daan Bakboord

DaAnalytics

DaAnalytics

Hi, ik ben Daan Bakboord. Ik ben een ondernemende Data & Analytics consultant. Mijn passie is het verzamelen, verwerken, opslaan en presenteren van data. Ik help organisaties deze interne en externe data zodanig in te zetten, dat ze in actie kunnen komen. Door het inzetten van betrouwbare data kunnen ondernemers hun ‘gut-feeling’ onderbouwen met feiten.

Leave a Reply

Do NOT follow this link or you will be banned from the site!