AI has become a mainstream topic in many industries, from retail and entertainment up to healthcare and manufacturing. One of the aspects that enables and helps ML projects to get from an idea to production is DevOps for machine learning, often called MLOps. A model's lifecycle from training to deployment must be auditable if not reproducible.
Here is a quick guide to understanding the main pillars of developing repeatable, high-quality machine learning models in Microsoft Azure that you will need to know to get ready for production ML projects.
If you work as a data scientist or ML engineer, you probably spend a lot of time in Jupyter notebooks doing EDA and training different versions of ML models. Once upon a time, ML models needed retraining to maintain high prediction capabilities measured as accuracy and other metrics. Retraining can be required for many reasons; for example adding a new class to an image classifier, environmental condition changes for sensor reading, or when a new product is added to an e-commerce product portfolio so that the recommendation ML system is still up-to date.
While model training is usually a straightforward task, an issue arises when you need to figure out what model version you have used to train the last production model and what data you used for it, so that you don’t repeat the same training process again.
So how do you track all of this? I’m glad you asked. Welcome to Azure Machine Learning, a platform that accelerates the end-to-end machine learning lifecycle.
Azure Machine Learning provides all necessary MLOps capabilities, ranging from creating reproducible ML pipelines and reusable software environments, to packaging and deploying models, monitoring for operational and ML issues, as well as retraining your model on new data.
The goal of all these steps is faster experimentation, faster deployment of models into production, and, last but not least, quality assurance and end-to-end lineage tracking.
ML pipelines are used to stitch together all of the steps involved in your model training process. An usual ML pipeline can contain steps from data preparation to feature extraction to hyperparameter tuning to model evaluation. Azure Machine Learning offers two ways to create ML pipelines.
The first one is a code-based approach by using Azure Machine Learning Python SDK. Here, a Pipeline object contains an ordered sequence of one or more pipeline steps. A Pipeline then runs as part of an Experiment. Below I show a simple Python pipeline consisting of data preparation and model training steps:
The snippet starts with common Azure Machine Learning objects, retrieving a Workspace, creating a datastore, a compute Target, and an experiment. Then, the code creates the objects to hold input data. The data preparation code, stored in dataprep.py, writes delimited files to the output path. These outputs from the data preparation step are passed to the training step.
The list steps hold the two Python script steps, one for data preparation and the other for model training. The code instantiates the Pipeline object itself, passing in the workspace and steps list. The call to experiment begins the Azure ML pipeline run. One can see that pipelines give structure to the machine learning project that’s necessary for production-based ML projects.
The second way of building pipelines is through a no-code, visual way using the Designer. You can access this tool from the Designer selection on the homepage of your workspace. The Designer allows you to drag and drop steps onto the design sketch. When you visually design pipelines, the inputs and outputs of a step are displayed visibly. You can drag and drop data connections, allowing you to quickly understand and modify the dataflow of your pipeline.
Model registration allows you to store and version your models in the Azure cloud. The model registry makes it easy to organise and keep track of your trained models. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. Tags can be attached and also used when searching for a model. When you use the SDK to train a model, you will receive a Run object that can be used to register a model created by an experimental run.
Here is a snippet that shows the registration step with model path parameter referring to the cloud location of the model:
As a data scientist, you want to be able to keep your focus on model development, rather than having to upskill on the many technologies needed for service hosting such as Kubernetes, Flask, and Swagger. Azure ML helps data scientists seamlessly transition their models into mature and fully featured web services, enabling them to be rapidly adopted in applications. First, you fetch the latest model from the registry and profile it, to see what kind of resources it might need in production.
With the profiling data in hand, you use Azure ML to package the model into a container and deploy it to an Azure Container Instance for real-time or batch inferencing. Azure ML provides automatic schema generation and monitoring of the model once deployed. It’s easy to deploy your models wherever they are needed, from the cloud to the edge. It can leverage any compute options such as CPU, GPU & FPGA. With just a couple of lines of code, you can push the same model to a more scalable Azure Kubernetes Service instance and even exercise advanced A/B testing capabilities for new model validation.
Once the model is deployed to an Azure Kubernetes Service instance, the web dev team can access the web service endpoint to call from their application for real-time or batch inferencing needs. The model endpoint is still monitored for data drift and other metrics that can be audited as needed.
Often, you'll want to validate your model, update it, or even retrain it from scratch, as you receive new information. There is no universal answer to "How do I know if I should retrain?" but Azure ML event and monitoring tools are good starting points for automation. Once you have decided to retrain, you should pre-process your data, train your new model, compare the outputs of your new model to those of your old model, and use predefined criteria to choose whether to replace your old model. A theme of the above steps is that your retraining should be automated. Azure Machine Learning pipelines are a good answer for creating workflows relating to data preparation, training, validation, and deployment.
So now you can assess how structured and well controlled your ML project can get in Azure Machine Learning, allowing you to bring models into production. So, what’s your next step? Let us know.