Daanalytics

Snowflake Summit 2022 — Recap

When I am writing this, I am on Harry Reid International Airport on my back home to The Netherlands. The last week I attended the Snowflake Summit 2022. The first big in-person event in years. Back in 2019 I attended the 2019 Snowflake Summit in San Francisco. This years event was 5 times as big if you ask me. Summits like these are for me an opportunity to meet like-minded people, absorb a ton of knowledge and see new innovations. This years event did not disappoint me.

Snowflake Data Superhero Dinner during Snowflake Summit 2022 in Las Vegas

First of all, I had the change to meet my fellow Snowflake ❄️ Data Superheroes. Most of us never had seen us in real-life. We had great conversation and a lot of fun at the same team. Thanks to Snowflake for setting up and improving this program. Interested becoming a Snowflake ❄️ Data Superhero yourself? Checkout this page and get active in the Community. Write articles and blogs, answer questions on the Snowflake forum and on StackOverflow or become active in a (local) chapter of the Snowflake User Groups.

There were over 200 partners present at the Snowflake Summit 2022. Impossible to visit them all. Still I managed to visit a few of them and some interesting discussions of their products and solutions. Enough to try out and further investigate when I am back home.

Change Data Capture

On Monday I decided that it might be interesting to follow some training sessions covering lots of features released in the last year. We covered the following topics:

  • Snowsight
  • Data Governance — Object Tagging & Classification, Conditional Masking & And Row Access Policies and Object Dependencies & Access Historie
  • Data Engineering & Unstructured Data
  • Snowflake Scripting
  • Functions and Stored Procedures
  • SQL REST API
  • Streamlit
  • Data Science & ML — Snowpark for Python

It was an interesting day white lots of room for questions and hands-on.

Disrupting Data Application Development in the Data Cloud

The Opening Keynote was on Tuesday. There are 7 pillars the Snowflake Data Cloud is built upon:

  1. Mission alignment in verticals — Specific Data Strategies, e.g. Financials, Telco, Retail, etc
  2. All Data & all Workloads — Structured, Semi-Structured and Un-Structured Data Formats in e.g. Data Warehousing-, Data Science- and Data Application Workloads. Access & process any type of Data at any scale and volume. Any origin, internally and externally shared. No compute for resources and no limit on workloads. Instant elasticity.
  3. Global, Cross Cloud, Cross Region — Data in the 3 major Clouds in various regions. Snowgrid allows to treat them as one. One unified global experience. Replication between Clouds and regions. One global platform, operating at scale.
  4. Self Managed — Turn it on and go, Snowflake takes care of the management of the platform. One platform designed to just work. Simplicity is key, no complexity.
  5. Programmability — Going beyond database functionality. Run your most important Data Applications in the Data Cloud. Snowpark for Python in Public Preview.
  6. Marketplaces — Monetizing on both Data as well as Applications. Share Data and Applications through a community-driven Marketplace. An app-store for Data & Applications.
  7. Governance — Trust Snowflake with your Data, Secure and Compliant. Consistently enforced regardless of Cloud or region.

Snowflake did several announcements building upon the above pillars, during this Opening Keynote:

General Announcements made at Snowflake Summit 2022

Core Platform

There were several announcements round performance improvements, beginning on AWS. Faster compute, new t-shirt sizes (5XL and 6XL) in Private Preview and Map Search improvements (Search Optimization Service)

Governance

When you want to trust Snowflake with your Data, there need to be solutions in place to ensure this goes in a trustworthy manner. New Governance announcements made at the Summit are focussed on 3 areas

Snowflake announces two new features in Private Preview to create Resource Groups and set budgets. Two features to control spending on budgets.

Tag-based Masking (coming soon to Public Preview) — easier automation of the assignment of masking policies to sensitive columns.

Column Data Lineage (coming soon to Private Preview) — adds to the object-level lineage (already GA within Access History)

New Data Governance Interface (coming soon to Private Preview) — built-in reports for tags and masking policies.

To ensure as less as possible data is lost in case of a failure, Snowflake has added several features. Replication goes beyond databases and will include Account Metadata (Public Preview) and Ingestion Pipelines (Private Preview).

Apache Iceberg (currently in Development).

According to Snowflake there was significant Customer Demand to extend External Tables support to connect data stored in Apache Iceberg format within the Data Cloud. The announced Apache Iceberg tables in Snowflake are still in Development. But what are Apache Iceberg tables? According to their website“Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, and Hive to safely work with the same tables, at the same time.”. Checkout more details on the Snowflake Blog.

Snowpipe streaming (currently in Private Preview).

With this updated re-designed version of the Apache Kafka connector Data is immediately ready to query. This improves the latency with a factor 10. The serverless (which mean you do not have to manage clusters) feature; Snowpipe Streaming ingests Data as Row Sets instead of Stages.

Materialized Tables (currently in Development).

Materialized Tables for streaming data makes it easy to join and transform Data in a streaming matter. A Materialized Table is a declarative Data Pipeline. Describe what to do instead of how. Write declarative code for the transformation and Snowflake handles the incremental refresh to materialize the Pipeline. The Materialized Table automatically refreshes as new data streams in.

Check out the last two above features in this video of Tyler Akidau.

External Tables for On-prem Data

Connecting to On-premise Data using storage compatible with the Amazon S3 REST API (Dell ECS, PureStorage or MinIO). This storage can be used as an External Table in Snowflake. This will make it easier to load your On-premise Data into Snowflake.

Application Development Disruption

Snowpark for Python (currently in Private Preview).

Snowpark for Python is Snowflake’s Developer Framework. Finally Snowflake supports Python, to build ML Applications in the language of choice, next to e.g. Scala and Java. The most popular packages from Anaconda are seamlessly accessible for processing Data at any scale with Snowpark for Python. See things in action here.

Native Streamlit Integration

“Putting Insights to use”

Earlier this Snowflake announced the acquisition of Streamlit. I just heard from Streamlit a few days before, during a Python training. It was and still is interesting what this acquisition means for Streamlit. Streamlit puts Insights to use. Bring Data to life for everyone in Python. Prototype, iterate and collaborate.

Snowflake and Streamlit wil be natively integrated. Streamlit apps can be selected from the Object Browser in Snowsight. The Streamlit-apps can be displayed and interacted with from within Snowsight. Streamlit remains Open Source which is a good thing looking at the large community supporting Streamlit.

Built and share Data Applications via Snowsight Python Worksheets.

SQL Extensions for ML

Machine Learning capabilities extended from within SQL. Probably for those not so into Python or any other programming language.

“Bring Your Application to Your Customer’s Data”

Native Application Framework (currently in Private Preview).

Monetizing Data Applications from within the Snowflake Marketplace. Build, distribute and server less deployment. All challenges which come with selling Data Applications, like security, governance, infrastructure, etc. are all addressed from within the Snowflake Marketplace. More details here.

Unistore & Hybrid Tables

A variety of improvements and innovations, which apply to Snowflake’s vision and strategy to become more than just a platform where you store and process data. The Snowflake Data Cloud is intended to be a platform where you build and monetize data products and data applications. A lot of features are still in Private and Public Preview, but still the future is interesting.

Check out the various new features on the Snowflake Youtube channel.

Summary

All in all an interesting week in Las Vegas for me. Too much information to digest in one week. A real challenge to keep up with all the new features and innovations. Enough to further investigate. This makes it even more fun to be part of the Snowflake Community. Snowflake changes and all the partners change with them.

I hope this gave some insight into the latest and greatest of the Snowflake Data Cloud. There might be questions or a need for more detail. Please let me know. I would be happy to discuss further.

Thanks for reading and till next time.

Daan Bakboord – DaAnalytics

Bekijk ook:

Snowflake’s Data Classification in Snowsight

Snowflake Data Governance directly from Snowsight

Last year I blogged about how to use Snowflake functionality to; “Know your Data”. Especially in these times where Generative AI becomes more and more mainstream, it’s essential to know what data is input for the LLM’s. Now Snowflake has made this a few clicks easier, offering classifying functionalities directly from Snowsight.

Lees verder »