Are you ready to build production-grade AI solutions?
June 7, 2022 · 6 min read
AI, in particularly Machine Learning (ML) has become a mainstream topic in many industries, from retail and entertainment up to and including healthcare and manufacturing.
Every ML solution is based on a mathematical model that takes input data and performs classification, regression, or clustering. The model is trained rather than explicitly programmed. The model's lifecycle, from training to deployment, must reproducible and the whole process must be compatible with DevOps. The DevOps for machine learning, often called MLOps, is based on three fundamental activities:
- Train
- Deploy
- Monitor
The life-cycle looks like this:
Here in this article is a quick guide to understanding the main pillars of developing repeatable, high-quality machine learning models in Microsoft Azure, and the things that you will need to know to get ready for production ML projects.
Building reproducible ML Workflows
ML engineers, or in more general data scientists, spend a lot of time in Jupyter notebooks doing exploratory data analysis and training different versions of ML models. In most cases the models need retraining to maintain high prediction capabilities, measured as accuracy, and other metrics. Retraining can be required for many reasons; for example, when adding a new class to an image classifier, environmental condition changes for sensor reading, or when a new product is added to an e-commerce product portfolio so that the recommendation ML system is still up-to date.
While model training is usually a straightforward task, an issue arises when you need to figure out what model version you have used to train the last production model and what data you used for it, so that you don’t repeat the same training process all over again.
In order to track all these aspects, Azure offers the Machine Learning Studio, a platform that accelerates the end-to-end machine learning lifecycle.
The Azure Machine Learning Studio provides all necessary MLOps capabilities, ranging from creating reproducible ML pipelines and reusable software environments, to packaging and deploying models, monitoring for operational and ML issues, as well as retraining your model on new data.
The goal of all these steps is faster experimentation, faster deployment of models into production, and, last but not least, quality assurance and end-to-end lineage tracking.
Machine learning pipelines
ML pipelines are used to stitch together all of the steps involved in your model training process. A usual ML pipeline can contain steps from data preparation to feature extraction to hyperparameter tuning to model evaluation.
Azure Machine Learning offers two ways to create ML pipelines:
- Code-based approach
- No-code, a visual way using the Designer
The first one is a code-based approach by using Azure Machine Learning Python SDK. Here, a Pipeline object contains an ordered sequence of one or more pipeline steps. A Pipeline then runs as part of an Experiment.
The second way of building pipelines is through a no-code, visual way using the Designer. You can access this tool from the Designer selection on the homepage of your workspace. The Designer allows you to drag and drop steps onto the design sketch. When you visually design pipelines, the inputs and outputs of a step are displayed visibly. You can drag and drop data connections, allowing you to quickly understand and modify the dataflow of your pipeline.
Registering and tracking ML models
Model registration allows you to store and version control your models in the Azure cloud. The model registry makes it easy to organise and keep track of your trained models. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. Tags can be attached and also used when searching for a model. When you use the SDK to train a model, you will receive a Run object that can be used to register a model created by an experimental run.
Deploying and using ML models
Data scientists want to be able to keep their focus on model development, rather than having to upskill on the many technologies needed for service hosting such as Kubernetes, Flask, and Swagger. Azure ML helps data scientists seamlessly transition their models into mature and fully featured web services, enabling them to be rapidly adopted in applications. First, you fetch the latest model from the registry and profile it, to see what kind of resources it might need in production.
With the profiling data in hand, you use Azure ML to package the model into a container and deploy it to an Azure Container Instance for real-time or batch inferencing. Azure ML provides automatic schema generation and monitoring of the model once deployed. It’s easy to deploy your models wherever they are needed, from the cloud to the edge. It can leverage any compute options such as CPU, GPU & FPGA. With just a couple of lines of code, you can push the same model to a more scalable Azure Kubernetes Service instance and even exercise advanced A/B testing capabilities for new model validation.
Once the model is deployed to an Azure Kubernetes Service instance, the development team can access the web service endpoint to call from their application for real-time or batch inferencing needs. The model endpoint is still monitored for data drift and other metrics that can be audited as needed.
Model Retraining
Often, as you receive new information, you'll want to validate your model, update it, or even retrain it from scratch. There is no universal answer to "How do I know if I should retrain?" but Azure ML event and monitoring tools are good starting points for automation. Once you have decided to retrain, you should pre-process your data, train your new model, compare the outputs of your new model to those of your old model, and use predefined criteria to choose whether to replace your old model. A theme of the above steps is that your retraining should be automated. Azure Machine Learning pipelines are a good answer for creating workflows relating to data preparation, training, validation, and deployment.
Summary
MLOps enables ML engineers to apply DevOps best practices for machine learning pipelines. This article focussed on ML Ops in the Azure cloud; however MLOps in general is platform-agnostic. Would you like to learn more? We will be happy to discuss it with you.
Written by Richard Vlas, replenished for Partnership Newsletter by Milan Piskla