Brillio Data Validation Suite used within an Azure-based Feature Store

Authors

Tharun Mathew is a highly skilled Senior Data Architect at Brillio EU. A global leader with strong expertise in building large-scale data lakes and lakehouses on Azure, AWS, Databricks, and Spark. Additionally, Tharun is experienced in building enterprise feature stores and ML engineering products on Databricks and Spark.

Brillio’s Data Validation Suite is built on Databricks and Great Expectations, is rule-based, and enables end-to-end data validation, monitoring, and reporting. Here are the key features of the solution:

Open source based that can be implemented in any Spark-based platform with minimal changes.
Extended using Azure Data Factory and Databricks for the Azure platform.
Enables perform rule-based data validation for every table refresh.
Stores validation output as delta tables and then PowerBI.
Triggers an automated email response to data and platform owners at any deviation from set expectations.

Figure 1: Solution Overview

As part of Brillio’s data validation solution, Azure Data Factory is used to orchestrate Databricks notebooks. After the data sets within the Data Lake have been refreshed, the ADF triggers are activated, starting the validation notebooks. Validation notebooks examine the SQL Server table or the config file within the data lake for rules and parameters.ake. At a column level, the rules define the benchmark that must be validated. The following snippet illustrates a sample validation configuration file. User-defined rules can easily be added or modified in these config files or tables.

Figure 2:Sample Rule Config table

Databricks notebooks are passed the tables to be checked and the rules to be validated as parameters. Upon completion of validation, the results are recorded in a SQL Server table. Users can create additional PowerBI dashboards based on these data sets, providing them with a real-time view of the data’s accuracy.

Figure 3:Validation Result Snippet

Figure 4: Validation Result Output

As a result, the data validation utility provides users with a comprehensive and real-time view of the accuracy of their data. The users are notified of any deviation from the set parameters within the rule table or configuration file. In order to fix issues or make a decision on whether to continue production runs, users perform a deep dive into the data, which includes detailed data analysis.

Forward-looking thoughts and compelling stories

Blog

Technology

Your roadmap to successful zero-touch service desks

Case Study

Technology

Revolutionizing User Experience: Achieving 100% User Satisfaction Improvement and 5x Performance Surge Through Platform Modernization

Thought Leadership

Technology

Powering innovation in product engineering with GenAI

You define the north star,
We pave the digital path

Let’s Connect

Services

Industries

Approach

Insights

About Us

Careers

Contact Us

Brillio Data Validation Suite used within an Azure-based Feature Store

Authors

Tharun Mathew

Forward-looking thoughts and compelling stories

Your roadmap to successful zero-touch service desks

Revolutionizing User Experience: Achieving 100% User Satisfaction Improvement and 5x Performance Surge Through Platform Modernization

Powering innovation in product engineering with GenAI

You define the north star, We pave the digital path

You define the north star,
We pave the digital path