Tooling and automation for ML model auditing

Updated: January 2022

Data Quality and Ops teams: Translators between Business KPI and ML Metric

Simply owning a performant ML model is rarely the end goal for an organization trying to build one. Usually, they are trying to improve an internal operation (i.e. defect detection) or user-facing product (i.e. fraud alerts). Companies high on the AI adoption curve (think large self-driving car companies or FAMGA) hire lots of smart ML engineers to build models, but they also hire enormous QA and Ops teams to audit their ML-powered tools, setting downstream goals for model performance and measuring if they are achieved. These teams assess where the models are failing per some operational/business metric, then hand this feedback back to the ML teams to work on improving the model. These teams are as crucial to making ML in production as the engineers who make the models. To make these concepts concrete, let's work through an example for an AV company:

In this example, we have two participating parties, the ML and Ops teams. The Ops team audits the model to direct the development made by the ML team. The recommendation from the audit is context- and time-specific: There are likely many problems with the current model, but fixing hard-braking is a priority over anything else. Our thesis is that this bifurcation of the ML development cycle emerges everywhere for all businesses that successfully adopt ML - Having an Ops team audit and direct ML development is crucial for the success of an ML deployment because there job is specifically to find what to work on next. We call this process "ML Auditing".

Most teams build custom tools and processes to audit their models. They use tools like data triggers, live data sample feeds, and metric dashboards to determine the performance per their KPIs. Many external variables exist in any company's operations that determine how the objective to improve user experience, maximizing revenue or lowering operational costs, is to be incorporated in model development. Understanding which metrics to best target with the model right now is a core responsibility of a Ops team doing ML Auditing.

ML Auditing for the rest of us

If you aren't in this bucket of technical sophistication (most of us aren't), systematically translating business KPIs into ML model improvements is not an easy task. Hiring an ML ops team to build these tools and do the work is a non-intuitive expense for management and demands a rare set of skills. Internal ML teams usually can't keep up with everchanging business context or don't have the time to properly audit their models. Since hiring an internal ML team is hard on its own, many companies relying on external ML vendors to build their models. These teams definitely lack the business context to do this translation themselves. Thus, when these organizations try to adopt ML, these issues culminate in one of the following outcomes: We need tooling to quickly do and navigate ML model audits. This tooling would need to provide business stakeholders the ability to 1. measure the model's efficacy in their business, and 2. give actionable feedback to model developers that directs development.