Recently Snowflake made working with Python from within Snowflake a little bit better. ‘Snowflake Worksheets for Python’ is in Public Preview, available for use for everybody. Time to find out how it works.
Generate Faker Data
I am planning a series of blogpost regarding Snowflake Data Governance in combination with Alation. Therefore I need some demo data. Recently I saw a blogpost of my fellow Snowflake Data Superhero; Maja Ferle about generating Sample Data using the faker Library within Snowpark. I took that blogpost as a starting point to generate some fake Customer Data using Snowflake Worksheets for Python.
Enable Anaconda Python Packages
To be able to use the Snowflake Worksheets for Python feature, you first need to review and accept the Anaconda Terms of Service in Snowsight. Therefor you need to follow the next steps:
- Select the ORGADMIN-role
- Go to Admin, Billing & Terms
- Enable Anaconda Python packages
- Acknowledge & Continue
Packages available in Anaconda
First you have to make sure that the required packages are available in Anaconda, by either checking the documentation or executing the following query:
If you want to select specific packages and their versions, execute the following query. In this example we are looking for; ‘faker’ and ‘pandas’.
There might be a need to include packages that are not available from Anaconda. According to the documentation, you can add a Python File from a Stage to a Worksheet.
Preparing Python Worksheet
Before you start developing in the Snowflake Worksheets for Python, the worksheet needs to be prepared.
- Select the Python Worksheet
- Select a database
- Select a warehouse
- Define how yo want to run the worksheet
- set the ‘Handler’ (the function to be called when executing the worksheet)
- set the ‘Return type’ (the type of result returned by the ‘Handler’)
- Select the required packages
- first select the package, then import the package in the worksheet
- Try to run the sample code to make sure things work.
Developing in Snowflake Worksheets for Python
When starting a new Python Worksheet, the sheet is filled with some sample code. This code can easily be changed according to your own needs. The code I needed to generate a simple Customer tabel is on GitHub.
- Add the code to the Python Worksheet and run the code
- View the results
- In this case the script (re-)creates and loads a table
- view the results by selecting from the newly created table
Note: If you run code via the Snowsight UI there will be added some additional execution time +/- 10 to 20 secs extra as it executes as a temporary Stored Procedure each run. If you register it as a Permanent Stored Procedure it will eliminate this extra execution time.
Deploy the code as a Stored Procedure
With a simple click of a button your Python script can be deployed as a Permanent Stored Procedure.
- Click ‘Deploy’
- Name the Stored Procedure
- Overwrite existing (Y/N)
- Call procedure
- Verify results in table
Python Worksheets for Snowflake are now in Public Preview which means it’s available to everyone. A native Python Code Editor in SnowSight including IntelliSense with auto-complete, Sowpark and third-party library support both via either Anaconda or manual upload of custom packages.
Deploying the Python scripts as Python Stored Procedures is easy and makes it possible to automatically schedule them via Snowflake Tasks.
More details in the Snowflake documentation; ‘Writing Snowpark Code in Python Worksheets’.
Till next time.