Daanalytics

Data Quality Framework in Snowflake & Streamlit

Data is one of the most important assets in organizations today. If you cannot trust the Data, decision making based on this Data is impossible. Measuring Data Quality is therefor essential. Inspired by the blogpost; “Data Quality Framework in Snowflake” of Divya Rajesh Surana, I decided to take it one step further by including Streamlit.

The Data Quality framework contains several checks, which will be stored in a database table and populated by a procedure. The code to create a Data Quality user, role, warehouse, database, etc. is on GitHub. The additional objects, like the table to hold the metrics and the procedure to populate the table are also on GitHub.

In the original blogpost, you can call the procedure by the below statement:

When the Procedure has run successfully, you can query the Data Quality Metrics table by issuing the below query:

Streamlit

While this works fine working from Snowflake, I thought it could be also interesting to make an Streamlit app for this. The app works as follows:

  • input fields to select the table (database, schema and table) you want to measure for Data Quality
  • a button that calls the procedure
  • the output of the query in a table

In previous Streamlit blogpost I covered connecting to Snowflake, which I will not repeat here. That’s a relatively straightforward process

Streamlit Sidebar Input fields to select he table (database, schema and table) you want to measure for Data Quality

Creating a sidebar with some Text inputs is easy. Call. the Streamlit-library prefixed with ‘st’. Make sure you include ‘sidebar’, to put the Text Inputs in the Sidebar.

See the code below for more details.

By clicking the ‘Run Quality Assurance’-button, the procedure; ‘QUALITY_ASSURANCE.QUALITY_CHECK.DATA_QUALITY’ is executed. When all is ok, a green message; ‘Procedure has been executed successfully.’ appears.

Executing the procedure results in the below output. A table with the results of querying the QUALITY_ASSURANCE.QUALITY_CHECK.DATA_QUALITY_METRICS table.

The complete code of this Streamlit Application is on GitHub. In this case I have deployed the Application in the Streamlit Cloud.

Closing Statements

This was the first solution I found online, because I was reading on Medium. I will not argue that this is the best solution. I found other solutions as well. Taking once code as head start and creating your own version out of it is easy. Building a Streamlit Application around it is even easier and worth a try. If I can do it……

Till next time.

Daan Bakboord – DaAnalytics

Bekijk ook:

Why didn't we see this coming?

Why didn’t we see this coming?

Early this month I attended the two days International Master Class in Strategic Intelligence executed by Rodenberg Tillman & Associates. If you’re really determined to move beyond simply gathering data and truly understand its strategic impact, this Master Class is designed for you. The Master Class is built around the Six Building Blocks™, ensuring comprehensive coverage of the critical aspects of Strategic Management and Intelligence. It’s perfect for business professionals who aim to excel by integrating Strategic Intelligence into their everyday practices, gaining the insights necessary to not only anticipate, but shape the future.

Lees verder »
Amsterdam User Group Meeting October 2024

Snowflake Dutch User Group – October 2024

Last night I had the privilege to organize a Snowflake ❄️ User Group in Snowflake’s Amsterdam Office.

Johan van der Kooij shared his experiences regarding optimizing Snowflake from a cost & performance perspective. He shared practical hints, as well as example queries, that you can use to optimize your Snowflake environment.

Lees verder »
Snowflake Data Cloud Summit - Wrap Up

Snowflake Snowflake Data Cloud Summit — Wrap Up

Snowflake Data Cloud Summit proved that after all this years the core idea remains the same and is still strong. Technology should serve and Snowflake makes things simple. One Single Unified Platform, one product and one engine. Ease of use and Govenance. Maximum efficiency and maximum simplicity.

Bring the processing of data to the data instead of the other way around. Snowflake as a Platform where you build and share your Data, Apps and AI Products. Your data never has to leave the Platform and Snowflake takes care of this Platform.

Lees verder »