Daanalytics

From Kaggle to Snowflake

In a previous blogpost I showed how to load .csv-files into Snowflake. I downloaded these files manually from Kaggle. In this post I show you how to make use of the Kaggle API to remove the manual download part.

Install Kaggle

I have used Kaggle in a Anaconda environment. Therefore I have a separate environment in which I installed Kaggle.

Kaggle API key

First we have to create a Kaggle API key which is necessary to connect to Kaggle. If you have a Kaggle account, you can create new API Token from you account settings (https://www.kaggle.com/<user_name>/account).

Standard implementation

Clicking the button above generates aย kaggle.json-file. This file needs to be stored in a folder calledย .kaggleย in your home directory. The kaggle.json-file has the following structure:

In the Python-script you can use the OS environment variables directly to authenticate, like presented below:

Customised example

For this example, I was curious whether I could include the kaggle.json content to the Credentials-file I used in my previous example.

Authentication in this customised example goes hand in hand with the authentication to Snowflake. The same Credentials-file is referenced for both Snowflake as well as Kagggle:

Download from Kaggle

Next step is downloading files from Kaggle. For this we reference the Kaggle API, specifically; the dataset_download_files() method

Unzip records

Data from Kaggle is downloaded in .zip-format. You can unzip the files from within the Kaggle API; ‘unzip=True’.

An alternative is to unzip the files via the statement below:

Continuing

The remainder of is similar to the previous post; From .csv to Snowflake.

  • Reading .csv Data
  • Creating Snowflake objects
  • Loading Data into Snowflake

Find the code for this blogpost on Github.

Thanks for reading and till next time.

Daan Bakboord โ€“ DaAnalytics

Bekijk ook:

Snowflake DSH Team 2025

๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ ๐—ฆ๐—ป๐—ผ๐˜„๐—ณ๐—น๐—ฎ๐—ธ๐—ฒ โ„๏ธ ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ต๐—ฒ๐—ฟ๐—ผ๐—ฒ๐˜€! ๐Ÿš€

Last week theย Snowflake โ„๏ธ Data Superhero Class of 2025ย has beenย announcedย byย Amilee Alesna. I am one the 105 worldwide Data Superheroes recognized by Snowflake. Looking back at my first avatar, created by Francis Mao, I realized that I am an OG Data Superhero since 2018.

Lees verder ยป

Snowflake BUILD Amsterdam – Cortex Analyst Hands-On Lab

Last Wednesday I had the privilege to organize and give a Snowflake BUILD Hands-On Lab. Snowflake BUILD is Snowflakeโ€™s yearly event for Developers, Data Scientists, Data Engineers, and all Data Professionals full โ€œof exclusive product announcements, โ€œhow toโ€ technical sessions, and hands-on labs focused on Snowflakeโ€™s latest innovations. Learn how to build data pipelines, models and apps in the age of generative AI and LLMs.โ€

Lees verder ยป