Daanalytics

From Kaggle to Snowflake

In a previous blogpost I showed how to load .csv-files into Snowflake. I downloaded these files manually from Kaggle. In this post I show you how to make use of the Kaggle API to remove the manual download part.

Install Kaggle

I have used Kaggle in a Anaconda environment. Therefore I have a separate environment in which I installed Kaggle.

Kaggle API key

First we have to create a Kaggle API key which is necessary to connect to Kaggle. If you have a Kaggle account, you can create new API Token from you account settings (https://www.kaggle.com/<user_name>/account).

Standard implementation

Clicking the button above generates a kaggle.json-file. This file needs to be stored in a folder called .kaggle in your home directory. The kaggle.json-file has the following structure:

In the Python-script you can use the OS environment variables directly to authenticate, like presented below:

Customised example

For this example, I was curious whether I could include the kaggle.json content to the Credentials-file I used in my previous example.

Authentication in this customised example goes hand in hand with the authentication to Snowflake. The same Credentials-file is referenced for both Snowflake as well as Kagggle:

Download from Kaggle

Next step is downloading files from Kaggle. For this we reference the Kaggle API, specifically; the dataset_download_files() method

Unzip records

Data from Kaggle is downloaded in .zip-format. You can unzip the files from within the Kaggle API; ‘unzip=True’.

An alternative is to unzip the files via the statement below:

Continuing

The remainder of is similar to the previous post; From .csv to Snowflake.

  • Reading .csv Data
  • Creating Snowflake objects
  • Loading Data into Snowflake

Find the code for this blogpost on Github.

Thanks for reading and till next time.

Daan Bakboord – DaAnalytics

Bekijk ook:

Snowflake Data Cloud Summit - Wrap Up

Snowflake Snowflake Data Cloud Summit — Wrap Up

Snowflake Data Cloud Summit proved that after all this years the core idea remains the same and is still strong. Technology should serve and Snowflake makes things simple. One Single Unified Platform, one product and one engine. Ease of use and Govenance. Maximum efficiency and maximum simplicity.

Bring the processing of data to the data instead of the other way around. Snowflake as a Platform where you build and share your Data, Apps and AI Products. Your data never has to leave the Platform and Snowflake takes care of this Platform.

Lees verder »
Snowflake Data Cloud Summit - Day II

Snowflake Data Cloud Summit – Day II

The second day of Snowflake Summit in the Moscone Center in San Francisco started with Platform Keynote packed with announcements and demo’s. The announcements were not necessarily completely new, but a continuation of things Snowflake was already working on. Lot’s of Developments and Previews have made it to GA status and are now Generally Available to the public. A few announcements were made as wel.

This blogpost a summarization of my notes with were possible a link to Snowflake publications or documentation.

Lees verder »
Snowflake Data Cloud Summit - Day I

Snowflake Data Cloud Summit – Day I

The first day of Snowflake Summit in the Moscone Center in San Francisco was a day with the first two announcements made during the Opening Keynote. These two announcements built upon the theme for this years Summit; “The Era of Enterprise AI”

Lees verder »