In a previous blog I mentioned that Snowflake’s Data Governance Framework focuses on three key areas:
- Know your Data
- Protect your data
- Connect your Ecosystem
In this blog we will focus on the key area; ‘Connect your Ecosystem’. Specifically on the Data Governance Accelerated program within Snowflake. Recently Snowflake released a Governance dashboard in Public Preview. On this dashboard you can see some information about Tags and Policies on Tables and Columns.
This is a relatively static representation of the Data Governance setup in Snowflake. One of the selected partners in this program is Alation. With Alation you can natively connect to Snowflake, ingest metadata, view en manage applied Data Governance setup in Snowflake.
Data Governance Accelerated
As part of Data Governance Accelerated, Snowflake integrates with Data Governance tools like Alation to enable seamless metadata management, data cataloging, and governance workflows within the Snowflake environment. This integration helps organizations establish a unified data governance framework that spans both the Snowflake platform and their chosen data governance tool.
Alation
How Alation powers Data Intelligence on Snowflake can be experienced by following this QuickStart. Setting things up can be done either via Snowflake Partner Connect or manually. In this blog we will perform a few manual steps to show the Alation-Snowflake integration.
Adding a Snowflake Data Source
First we add a Snowflake Data Source into Alation. We will continue on the Tasty Bytes Quistart dataset. Alation we perform the following functions in Snowflake:
- Metadata Extraction (MDE),
- “Job fetches the database’s metadata (including schemas, tables, columns, and procedures/functions) to keep the representation of the database up to date”
- Profiling/Sampling,
- “Job periodically selects a limited number of rows from each table in the database to give users of this database a quick preview of the data content. “
- Query Log Ingestion (QLI)
- “Alation can generate a wide variety of valuable information on database objects by examining the queries executed on them from a query history log. Some of the generated analysis includes lineage, popularity of the database objects, frequent users of database objects, and commonly used filters and expressions.”
Therefore we need to create a Alation-user in Snowflake which must be assigned a role with enough privileges for the above jobs.
If the Service Account is in place, we can add the mentioned FROSTBYTE_TASTY_BYTES-datasource to Alation. Creating the Data Source is a matter of filling in the connection details. After that the three above mentioned jobs need to be executed; ‘Run Metadata Extraction’, ‘Run Data Sampling’ and ‘Run Query Log Ingestion’.
First you run the ‘Metadata Extraction’. In the Snowflake Query History you can follow and monitor this process. You can see exactly which statements are executed and whether they were successful.
Second you run ‘Data Sampling’. You can choose to either do all the tables or a subset. In this case there are not so many tables, sow we can go for; ‘Sample all tables’. Again the Query History can be followed and monitored to see exactly which statements are executed and whether they were successful.
Third and final is running the ‘Query Log Ingestion’. First you can do a preview. In this case you can see the query’s of the first two jobs. After the Preview, you can import the query logs.
When all three processes have executed well, the Data Source is ready to be used.
Metadata Extraction
While running Metadata Extraction, also (next to the schema’s, table, etc.) existing Tags will be imported from Snowflake. Make sure that the METADATA_DB-objects are created and available. Then you will be able to see and use the Tags
Via the Policy Center you can see the imported policies we created earlier. Drilling down to the Data Source, Table and Column you can see where the policy is applied.
Compose Query’s
Alation provides the feature to query Snowflake directly and if desired save these query’s for later use. Querying Snowflake tables includes the Data Governance features, which we applied earlier according to the role one uses in Alation. In the example below you still see the asterisks inn the email-column and the phone_number-column.
If you have a query you want to store in the Alation Catalog; e.g. ‘Total Sales for Single Women per City’, you can write the query in compose and publish it to the Alation Catalog
When publishing the query, you can find the query on the associated table (in this case the CUSTOMER_LOYALTY_METRICS_V-view) in the Alation Catalog.
Because of this stored query, also a join is added to the Alation Catalog.
Data Lineage
Based on the imported Metadata and the Query Log Ingestion, Alation is capable of providing Data Lineage information. This information is very helpful when implementing changes to existing objects.
Alation Data lineage enables Impact Analysis (show objects that are impacted downstream from an object) and Upstream Audit (show upstream objects that impact an object).
Wrapping up – ‘Five days inside Snowflake Data Governance’
This blogpost was a brief introduction into Alation for Snowflake. As we have seen, Alation provides users with a centralized location to discover, understand, and collaborate on data assets stored in Snowflake. With a few simple steps Snowflake assets can be integrated into the the Alation platform and managed from there.
In following blogpost I’ll probably go deeper into the Alation functionalities for Snowflake.
With this last post there comes an end to five days inside Snowflake Data Governance. In five different blogpost we have covered various features of Data Governance and Snowflake. Using the Tasty Bytes Quickstart dataset we could create various examples.
We started to ‘Know your Data’. This aspect of Snowflake’s Data Governance Framework is all about understanding your data and its quality, lineage, and usage.
In our third blogpost we showed how to ‘Protect your Data’. Data protection is a critical component of Snowflake’s Data Governance Framework. These features enable you to define and enforce data access policies, monitor data access and usage, and ensure compliance with industry and government regulations.
The last key area in Snowflake’s Data Governance Framework is; ‘Connect your Ecosystem’. Collaboration is one of the important workloads in the Data Cloud. Snowflake Data Governance provides the necessary controls and mechanisms to ensure secure collaboration within the Snowflake Data Cloud.
An example of how Snowflake connects with its ecosystem is the Data Governance Accelerated program, where Snowflake integrates with Data Governance tools like Alation to enable seamless metadata management, data cataloging, and governance workflows within the Snowflake environment. We used some examples to show the integration between Alation and Snowflake. In later blogposts we will dive deeper into this integration.
Till next time.
Director Data & AI at Pong and Snowflake Data Superhero. Online better known as; DaAnalytics.