The main function of MLOps is to automate the more repeatable steps in the ML workflows of data scientists and ML engineers, from model development and training to model implementation and use (model serving). Automating these steps provides agility for businesses and better experiences for users and end customers, increasing the speed, power, and reliability of ML. These automated processes can also reduce risk and free developers from routine tasks, allowing them to spend more time on innovation. All of this adds to the bottom line: a Global 2021 study by McKinsey found that companies that successfully scale AI can add up to 20 percent to their earnings before interest and taxes (EBIT).
“It’s not uncommon for companies with advanced ML capabilities to incubate different ML tools in separate areas of the business,” said Vincent David, senior director for machine learning at Capital One. “But often you start to see parallels: ML systems do similar things, but with a slightly different twist. The companies figuring out how to get the most out of their ML investments are pooling and amplifying their best ML capabilities to create standardized, foundational tools and platforms that everyone can use – and ultimately create differentiated value in the marketplace.”
In practice, MLOps requires close collaboration between data scientists, ML engineers, and site reliability engineers (SREs) to ensure consistent reproducibility, monitoring, and maintenance of ML models. In recent years, Capital One has developed MLOps best practices that apply across all industries: balancing user needs, adopting a common cloud-based technology stack and foundational platforms, leveraging open source tools, and ensuring the right level of accessibility and governance for both data and models.
Understand the different needs of different users
ML applications generally have two main types of users: technical experts (data scientists and ML engineers) and non-technical experts (business analysts) and it is important to balance their different needs. Technical experts often prefer complete freedom to use all available tools to build models for their intended use cases. Non-technical experts, on the other hand, need easy-to-use tools that allow them to access the data they need to create value in their own workflows.
To build consistent processes and workflows while satisfying both groups, David recommends meeting with the application design team and subject matter experts for a wide variety of use cases. “We look at specific cases to understand the issues so that users get what they need for their work, in particular, but also for the business in general,” he says. “The key is figuring out how to create the right capabilities while balancing the different needs of stakeholders and companies within the enterprise.”
Adopt a common technology stack
Collaboration between development teams – critical to successful MLOps – can be difficult and time consuming if these teams don’t share the same technology stack. A unified tech stack allows developers to standardize and reuse components, functions, and tools for models such as Lego bricks. “That makes it easier to combine related capabilities so developers don’t waste time switching from one model or system to another,” says David.
A cloud-native stack, built to take advantage of the cloud model of distributed computing, enables developers to provide self-service infrastructure on demand, continuously leveraging new capabilities and introducing new services. Capital One’s decision to go all-in on the public cloud has had a remarkable impact on developer efficiency and speed. Code releases to production are now much faster and ML platforms and models are reusable across the wider enterprise.
Save time with open source ML tools
Open-source ML tools (code and programs that are freely available for anyone to use and modify) are core ingredients in creating a strong cloud foundation and unified tech stack. By using existing open source tools, the company does not have to spend precious technical resources reinventing the wheel, allowing teams to build and deploy models faster.
Contents