Case Study | Technology
The client is one of the largest debt management companies in Europe, on a mission to help our customers take control of their debt. The client was founded in 2015 following the merger of the UK and German market leaders
The customer struggled with several challenges. The current process for selecting samples and feature engineering involves working with over 50 source tables from various SQL databases, SAS datasets, and Excel files, which, with query development, execution, and contingency planning, would take up to four weeks to complete. Additionally, inconsistent primary keys and column naming conventions require in-depth knowledge of each table.
The current process relies on over 30+ SAS Macros, which output data that must be joined manually, resulting in ad-hoc runs that can take anywhere from 5 minutes to 10 hours per macro for just a month of data.
The customer also needed one view of client credit and debit data and a snapshot of the current ID and data backdated. Since historic records were not captured, a complete picture of how consumers looked in the past was impossible.
Brillio leveraged Azure Data Lake and Databricks to build a data lake house and design and build ML data engineering products. We implemented Databricks MLFlow and Azure ML to capture machine learning experiments, model runs, and results and leveraged it as a model registry to store, manage, and load models in production. Additionally, Databricks MLFLow and DevOps were leveraged to build MLOps pipelines to automate ML product integration and deployment across the organizations. The key highlights of the solution include,
Implementation of “Sample Selector”, which provides data scientists with pre-built datasets for the most used sources for training testing and validation purposes.
Addition of feature stores on Delta Lake that calculate the state and maintain different features over time. Addition of the ability to enable time travel or retro capabilities based on changes in data sources.
Implementation of model validation and data validation notebooks orchestrated by ADF and Databricks post-model score generation or at specific intervals to check score and data correctness.
Following Brillio’s implementation, the client managed:
90% reduced query execution time on a like-for-like basis by performing sample selection using six aggregated tables, combining source data with consistent primary keys and naming conventions.
Faster run times by enabling users to join tables directly, filter queries at the source, and extract only the required data for sample selection.
Update data in each feature store family individually across historical data as old as 20+ years with a cumulative run time of 4 hours.
Empower the client with multiple views of the consumer credit and debit data and a full view of historic IDs and data and data backdated.
Capture and include restated records by keeping historical retros.