Five days inside Snowflake Data Governance – Know your Data

In the previous blog I mentioned that Snowflake’s Data Governance Framework focuses on three key areas:

Know your Data
Protect your data
Connect your Ecosystem

Know your Data

In this blog we will focus on the key area; ‘Know your Data’. This aspect of Snowflake’s Data Governance Framework is all about understanding your data and its quality, lineage, and usage. It answers questions about the What & the Where (Classification & Object Tagging) and the Who (Account Usage and Access History) of your data.

For this blog series we will use the Tasty Bytes Quistart dataset.

Classification

Snowflake has built-in Automatic Data Classification, which makes it easier for customers to know and then protect their data.

There are several key use cases are for Data Classification

Data Stewards and Privacy Officers that need to know what sensitive personal data their organization has,
Data Security Admins that need to make sure the right protection is in place for this data (including anonymization which classification helps streamline),
Compliance Officers need to produce audit reports of who has accessed sensitive personal data.

Automatic Data Classification analyzes columns in structured data for personal information and provides customers with a set of pre-defined Snowflake System Tags to automatically categorize this data according to the supported types.

These System Tags automate the process of classifying data.

The System Tags are:

semantic_category – tells you what the cells of the column contain
- some possible values are: name, gender, age, us_postal_code, phone
privacy_category – tells you what kind of personal information it is
- some possible values are: identifier, quasi_identifier, insensitive

Now that the System Tags have been applied an admin can define policies based on their values.

In this way the admin can control who gets access to what columns in which tables thereby protecting your data and they can also use the System Tags to audit who has accessed columns that contain personal data.

Classification in Snowflake is a multi-step process where Snowflake-defined tags kan be applied to table columns. This is a three-step process:

Analyze
Review
Apply

Analyze

First step is to run the EXTRACT_SEMANTIC_CATEGORIES-function in Snowflake, which takes a specific table as input.

This statement returns a VARIANT in JSON format.

Review

This statement returns a VARIANT in JSON format like part of the output below.

By using the FLATTEN table function readability can be enhanced:

If you review the output of the above query, you will see that e.g. the column; ‘BIRTHDAY_DATE’ has the following output:

Secondary Category – ‘DATE_OF_BIRTH’
Privacy Category – ‘QUASI_IDENTIFIER’
Probability – 1
Alternates – ‘[]’

Check the documentation for further details about the above output and their relationship(s).

Apply

In the last Apply-step, the Calssification System tags can be assigned to the columns. This can both be done manual as well as automated;

The automated proces is a call to the following procedure:

If the proces runs successfully, the procedure ends with a message like; ‘Applied tag semantic_category to <n> columns. Applied tag privacy_category to <n> columns.’

Object Tagging

One of the Data Governance operations is Object Tagging. This operation enables us to categorize Data Objects and / or describe data stored in these objects. As we could see while reviewing the example above, there are several PII-columns. These columns are eligible for object tagging.

FIRST_NAME
LAST_NAME
PHONE_NUMBER
E_MAIL

Create tags

“A tag is a schema-level object that can be assigned to another Snowflake object. A tag can be assigned an arbitrary string value upon assigning the tag to a Snowflake object. Snowflake stores the tag and its string value as a key-value pair. The tag must be unique for your schema, and the tag value is always a string.”

We will create three different tags:

pii_name_tag (FIRST_NAME, LAST_NAME)
pii_phone_number_tag (PHONE_NUMBER)
pii_email_tag (E_MAIL)

Creating tags can e.g. be done by the following statement:

You can verify whether everything is been done correctly via the below statements where you can verify the DDL and see the allowed values for the specified tag.

Apply tags

Tags need to be assigned to (existing or new) Snowflake objects to be able to use them.

Track the tags

To validate the tags we have created and applied we can run e.g. the following query to track the tags.

This one way to identify the tag assignments. There are more. Follow the documentation to find out.

Object Dependencies

Identifying how Snowflake’s Data Objects are related can provide insight into various Data Tracking use cases, like Identifying dependencies and downstream impact. This can be achieved by querying the Account Usage view OBJECT_DEPENDENCIES. Check out the following query to see how the CUSTOMER_LOYALTY_V-view is related to the CUSTOMER_LOYALTY-table.

For more information, check out the documentation.

Who – ACCOUNT_USAGE & ACCESS_HISTORY

Up until now we covered the ‘What’ and the ‘Where’ about knowing your data. Understanding ‘Who’ has access to your data and how they are using it is a critical aspect of Data Governance.

Snowflake provides a set of built-in views, including the views in the ACCOUNT_USAGE-schema and the ACCESS_HISTORY-view, that enable you to monitor who is accessing your data and how they are using it.

The ACCOUNT_USAGE-schema provides information about user activity in your Snowflake account, such as who is logging in, when they are logging in, and which objects they are accessing. This schema also provides information about queries executed by users, including the SQL statement, execution time, and resource usage.

The ACCESS_HISTORY-view in Snowflake provides a detailed history of all access to databases, schemas, tables, views, and other database objects within a Snowflake account. This view allows administrators to monitor and audit user access to sensitive data, and identify potential security risks or violations.

By analyzing the information provided by these views, you can gain insights into who is accessing your data, how they are using it, and whether they are complying with Data Governance policies and regulations. This information can help you identify potential risks and vulnerabilities, and take appropriate actions to mitigate them.

Till next time.

Director Data & AI at Pong and Snowflake Data Superhero. Online better known as; DaAnalytics.

Daan Bakboord

Bekijk ook:

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗵𝗲 𝟮𝟬𝟮𝟱 𝗦𝗻𝗼𝘄𝗳𝗹𝗮𝗸𝗲 ❄️ 𝗗𝗮𝘁𝗮 𝗦𝘂𝗽𝗲𝗿𝗵𝗲𝗿𝗼𝗲𝘀! 🚀

Last week the Snowflake ❄️ Data Superhero Class of 2025 has been announced by Amilee Alesna. I am one the 105 worldwide Data Superheroes recognized by Snowflake. Looking back at my first avatar, created by Francis Mao, I realized that I am an OG Data Superhero since 2018.

Lees verder »

februari 9, 2025

How to use Snowflake’s Cortex Analyst

Setup a Snowflake Sample Dataset and query data with Cortex Analyst based on a Semantic Model generated by the Snowflake Semantic Model generator.

Lees verder »

februari 3, 2025

Snowflake BUILD Amsterdam – Cortex Analyst Hands-On Lab

Last Wednesday I had the privilege to organize and give a Snowflake BUILD Hands-On Lab. Snowflake BUILD is Snowflake’s yearly event for Developers, Data Scientists, Data Engineers, and all Data Professionals full “of exclusive product announcements, “how to” technical sessions, and hands-on labs focused on Snowflake’s latest innovations. Learn how to build data pipelines, models and apps in the age of generative AI and LLMs.”

Lees verder »

januari 17, 2025

Five days inside Snowflake Data Governance – Know your Data

Know your Data

Classification

Analyze

Review

Apply

Object Tagging

Create tags

Apply tags

Track the tags

Object Dependencies

Who – ACCOUNT_USAGE & ACCESS_HISTORY

Bekijk ook:

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗵𝗲 𝟮𝟬𝟮𝟱 𝗦𝗻𝗼𝘄𝗳𝗹𝗮𝗸𝗲 ❄️ 𝗗𝗮𝘁𝗮 𝗦𝘂𝗽𝗲𝗿𝗵𝗲𝗿𝗼𝗲𝘀! 🚀

How to use Snowflake’s Cortex Analyst

Snowflake BUILD Amsterdam – Cortex Analyst Hands-On Lab

DaAnalytics

Snelle links

Belangrijke informatie

Social media