Daanalytics

Loading F1 Historical Data into Snowflake using the Ergast Developer API

Ergast Developer API

Recently I blogged about loading F1 Historical Data into Snowflake. The first time I downloaded the related .csv-files from Kaggle. In a consecutive post I loaded the data directly using the Kaggle API.

  1. From .csv to Snowflake
  2. From Kaggle to Snowflake

This time I am going to use the Ergast Developer API. “The Ergast Developer API is an experimental web service which provides a historical record of motor racing data for non-commercial purposes.” This database contains an API which can return data in XML, JSON, or PJSON formats. In this example we are going to work with the JSON-format.

Preparation

Before we start this example it’s good to read through some documentation. The first stop is the Ergast Developer API-website, which gives a lot of information about the different tables and how they relate to each other. Therefore it’s good to have a look at the ERD-diagram and the Ergast Database User Guide. Postman has a collection of documentation of how to work with the Ergast API.

Loading Circuits Data

The first example should be relatively easy, just loading Circuits Data. The first thing we need is the website API URL; https://ergast.com/api/f1/circuits.json. Depending on the Data you want to load, the url should be extended with the table name; https://ergast.com/api/f1.json. The Postman-documentation gives information about how to work with the Ergast API for (in this example) the Circuits.

Calling the Ergast API

Using the Postman-documentation we can find the relevent JSON-path. Then we have to specify what data we want to convert into a pandas DataFrame. So in a few steps we can load the required data into a DataFrame to then load it into Snowflake. The Ergast API prevents to load all data in one go. Therefore we have to loop through the API website until all records a retrieved. Find below how to ‘Get the Ergast API’, ‘Parse the JSON Data’, ‘Loop through the API website’, ‘Prepare the JSON Data’ and ‘Construct a DataFrame’.

I must say, I am not an Python-expert. These is part of my learning experience. There might be other, better options, which I am open to further explore. Currently I depend on my Google-skills and the ability to convert my findings Ito working examples.

Loading Data into Snowflake

The DataFrame created in the previous paragraph can be used to load the data into Snowflake. In this example we have to ‘Connect to Snowflake’, ‘Create a Database / Schema’, ‘Create a Table’ and ‘Load the Table’

Connect to Snowflake

First we have to Snowflake from Python. I created a solution with a separate file for the credentials and a reference to that file in the .py-file. Therefore I created a short example.

  1. Credentials
  2. Validate

Create a Database / Schema

With the connection to Snowflake ready, it’s time to create a Database and a Schema. Of course this can be done from the Snowflake UI, but for the sake of this example we do it from the Python-script.

Create a Table

Next step is to create a table from a Pandas DataFrame.

Load the Table

Finally all is in place to load the data into the created Snowflake table.

A working example

Now we have seen the different components, it might be interesting to see a working example. Going back to the Circuits-table, this could look something like the below Jupyter-notebook.

I am not completely sure how to get around these messages;

“<snowflake.connector.cursor.SnowflakeCursor at 0x7fd4d5db0ee0>”

The end result in Snowflake is like expected. See below.

In this blogpost we have used the Ergast Developer API to load F1 Historical Data into a Python DataFrame and from there into Snowflake. In a follow-up post I will do something similar with the remaining tables. The code is on GitHub.

Thanks for reading and till next time.

Daan Bakboord – DaAnalytics

Bekijk ook:

Why didn't we see this coming?

Why didn’t we see this coming?

Early this month I attended the two days International Master Class in Strategic Intelligence executed by Rodenberg Tillman & Associates. If you’re really determined to move beyond simply gathering data and truly understand its strategic impact, this Master Class is designed for you. The Master Class is built around the Six Building Blocks™, ensuring comprehensive coverage of the critical aspects of Strategic Management and Intelligence. It’s perfect for business professionals who aim to excel by integrating Strategic Intelligence into their everyday practices, gaining the insights necessary to not only anticipate, but shape the future.

Lees verder »
Amsterdam User Group Meeting October 2024

Snowflake Dutch User Group – October 2024

Last night I had the privilege to organize a Snowflake ❄️ User Group in Snowflake’s Amsterdam Office.

Johan van der Kooij shared his experiences regarding optimizing Snowflake from a cost & performance perspective. He shared practical hints, as well as example queries, that you can use to optimize your Snowflake environment.

Lees verder »
Snowflake Data Cloud Summit - Wrap Up

Snowflake Snowflake Data Cloud Summit — Wrap Up

Snowflake Data Cloud Summit proved that after all this years the core idea remains the same and is still strong. Technology should serve and Snowflake makes things simple. One Single Unified Platform, one product and one engine. Ease of use and Govenance. Maximum efficiency and maximum simplicity.

Bring the processing of data to the data instead of the other way around. Snowflake as a Platform where you build and share your Data, Apps and AI Products. Your data never has to leave the Platform and Snowflake takes care of this Platform.

Lees verder »