Suppose your company’s data science teams have documented business goals for areas where analytics and machine learning models can have business impact. Now they are ready to start. They labeled datasets, selected machine learning technologies, and established a process for developing machine learning models. They have access to a scalable cloud infrastructure. Is that enough to give the team the green light to develop machine learning models and deploy successful ones to production?
Not so fast, say some machine learning and artificial intelligence experts who know that every innovation and production deployment carries risks that require revisions and remediation strategies. They advocate establishing risk management practices early in the development and data science process. “In data science or any other similar business activity, innovation and risk management are two sides of the same coin,” says John Wheeler, senior risk and technology advisor for AuditBoard.
Drawing an analogy with application development, software developers don’t just develop code and deploy it to production without considering risk and best practices. Most organizations establish a Software Development Life Cycle (SDLC), shift devsecops practices to the left, and create observability standards to address risk. These practices also ensure that development teams can maintain and improve the code once it is deployed to production.
The equivalent of SDLC in managing machine learning models is modelops, a set of practices for managing the lifecycle of machine learning models. Modelops practices include how data scientists build, test, and deploy machine learning models in production, and then how they monitor and improve ML models to ensure they deliver expected results.
Risk management is a broad category of potential problems and their resolution, so I focus on those related to modelops and the machine learning lifecycle in this article. Other risk management topics include data quality, data privacy, and data security. Data scientists should also review training data for bias and consider other important responsible AI and ethical AI factors.
Speaking with several experts, below are five problem areas that modeling practices and technologies can play a part in solving.
Risk 1. Developing models without a risk management strategy
In the State of Modelops 2022 report, more than 60% of AI business leaders indicated that risk management and regulatory compliance are difficult. Data scientists are typically not risk management experts, and in enterprises, a first step should be to partner with risk management leaders and develop a strategy aligned with the modelops lifecycle. .
Wheeler says, “The purpose of innovation is to find better ways to achieve the desired business outcome. For data scientists, this often means creating new data models to improve decision making. However, without risk management, this desired business outcome can come at a high cost. When striving to innovate, data scientists must also seek to create reliable and valid data models by understanding and mitigating the risks inherent in data.
Two white papers to learn more about model risk management come from Domino and ModelOp. Data scientists should also institute data observability practices.
Risk 2. Increased maintenance with duplicate and domain-specific templates
Data science teams also need to create standards for which business problems to focus on and how to generalize models that work across one or more domains and business domains. Data science teams should avoid building and maintaining multiple models that solve similar problems; they need effective techniques to train models in new fields of activity.
Srikumar Ramanathan, Director of Solutions at Mphasis, recognizes this challenge and its impact. “Whenever the domain changes, ML models are trained from scratch, even using standard machine learning principles,” he says.
Ramanathan offers this remediation. “By using incremental learning, in which we use the input data continuously to extend the model, we can train the model for new domains using fewer resources.”
Incremental learning is a technique of training models on new data continuously or at a set rate. There are examples of incremental learning on AWS SageMaker, Azure Cognitive Search, Matlab, and Python River.
Risk 3. Deploying too many models for the capacity of the data science team
The challenge of maintaining models goes beyond steps to retraining them or implementing incremental learning. Kjell Carlsson, head of data science strategy and evangelism at Domino Data Lab, says, “A growing but largely overlooked risk is the constantly lagging ability of data science teams to redevelop and redeploy their models.
In the same way that DevOps teams measure feature delivery and deployment cycle time, data scientists can measure the velocity of their model.
Carlsson explains the risk and says, “Model velocity is typically well below what is needed, resulting in a growing backlog of underperforming models. As these patterns become increasingly critical and embedded across all businesses, combined with accelerating changes in customer and market behavior, this creates a ticking time bomb.
Dare I label this problem “debt model?” As Carlsson suggests, measuring model velocity and the business impacts of underperforming models is the key starting point for managing this risk.
Data science teams should consider centralizing a model catalog or registry so that team members know the scope of existing models, their status in the ML model lifecycle, and who is responsible for managing it. Model catalog and registry functionality can be found in data catalog platforms, ML development tools, and MLops and modelops technologies.
Risk 4. Being blocked by bureaucratic review boards
Assume that the data science team followed the organization’s standards and best practices for data and model governance. Are they finally ready to deploy a model?
Risk management organizations may wish to institute review committees to ensure that data science teams mitigate all reasonable risks. Risk assessments can be reasonable when data science teams are just beginning to deploy machine learning models in production and adopt risk management practices. But when is a review board needed and what should you do if the board becomes a bottleneck?
Chris Luiz, Director of Solutions and Success at Monitaur, offers an alternative approach. “A better solution than a top-down, post-hoc, and draconian executive review board is a combination of strong governance principles, software products that match the data science lifecycle, and strong stakeholder alignment. throughout the governance process.”
Luiz has several recommendations on modelops technologies. He says, “Tooling should fit seamlessly into the data science lifecycle, maintain (and preferably increase) the speed of innovation, meet stakeholder needs, and provide a self-service experience for users. non-technical stakeholders.
Modelops technologies with risk management capabilities include platforms from Datatron, Domino, Fiddler, MathWorks, ModelOp, Monitaur, RapidMiner, SAS and TIBCO Software.
Risk 5. Not monitoring models for data drift and operational issues
When a tree falls in the forest, will anyone notice? We know that code must be maintained to support framework, library, and infrastructure upgrades. When an ML model is underperforming, do trend monitors and reports alert data science teams?
“Every AI/ML model put into production is guaranteed to degrade over time due to changing data from dynamic business environments,” said Hillary Ashton, executive vice president and product manager at Teradata.
Ashton recommends: “Once in production, data scientists can use modelops to automatically detect when models are starting to degrade (reactive via concept drift) or are likely to start degrading (proactive via concept drift). data and data quality drift). They can be alerted to investigate and take action, such as recycle (refresh model), retire (full modeling required), or ignore (false alarm). In the event of conversion, remediation can be fully automated.
What you need to take away from this review is that data science teams need to define their modelops lifecycle and develop a risk management strategy for key milestones. Data science teams should partner with their compliance and risk managers and use tools and automation to centralize a catalog of models, improve model velocity, and reduce the impacts of data drift.
Copyright © 2022 IDG Communications, Inc.
#risks #machine #learning #modelops #addresses