Machine Learning Operations

#aws #DataEngineering #CM2606

What is MLOPS

Important

Machine Learning Operations take into consideration the ETL pipelines, deployment manners and machine learning components of a normal workflow. This regards architectures and basic ingestion pipelines.

MLOps is basically a combination of data engineering, DevOps and Machine learning as to where an engineer is tasked with developing the ETL framework for a project while taking into consideration sub domains such as data retention, backups and other such features.

Pasted image 20250319090858.png

MLOps vs DevOps

Important

Both MLOps and DevOps take an iterative approach to shipping applications into production, however MLOps prioritize deploying machine learning models in a deployable stage by considering all the aspects required to implementing it

In other words the key differences between MLOps and DevOps is that:

In MLOps there is an important process called CI/CD (continuous integration, continuous delivery). It is a practice that refers to the ongoing process of recognizing issues, reassessing them and updating machine learning models autonomously. The process of CI/CD negates the need to constantly, manually, update machine learning models to keep up with new trends and data. The following is an example of a CI/CD pipeline:

Pasted image 20250320135251.png

ML Pipeline against an ETL pipeline

Important

In a machine learning pipeline, the flow is similar to an ETL pipeline, however its core components differ given the fact that the data being intake has to be validated before being processed and the deployed towards the end of the pipeline

Refer to the following image to see an example of a ML pipeline:

Pasted image 20250320140212.png

Compared to machine learning pipelines, there are also data orchestration pipelines. Here instead of creating an ingestion flow with data transformation in the middle to a model there will instead be a pipeline to simply orchestrate where the data is to be collected from, how it should be transformed and where it should end up. The following are the main three types of scenario's:

Scenario Primary Persona Azure Offering OSS Offering Canonical Pipe Strengths
Model Orchestration (machine learning) Data Scientist Azure Machine Learning Pipelines Kubeflow Pipelines Data-> Model Distribution Caching, code-first reuse
Data Orchestration (Data Prep) Data Engineer Azure Data Factory Pipelines Apache Airflow Data-> Data Strongly typed movement, data centric activities
Code & App orchestration (CI/CD) App Developer/Ops Azure Pipelines Jenkins Code+Model->App/Service Most open and flexible activity support, approval queues and phases with gating

Deployment

This regards the process of deploying and implementing a product into a system that can be accessed by end consumers. In terms of machine learning model pipelines, this would mean being able to consume the data made through an API.

In short, Deployment in Machine Learning is the method by which you integrate a machine learning model into an existing production environment to make practical business decisions based on data. It is the last stage in the machine learning lifecycle.

Application Programming Interface (API)

Important

An API acts as an intermediary (middle man), that provides a user with access to a system data. It enables applications to communicate with each other. In other words, when provided with a pre-defined input, the API will respond with an expected outcome

Machine learning models may be accessed through API's as it provides an interaction based interface to access the outcomes of the model. In most cases the output may be in the form of a .json file, etc. Refer to the following image to see how most API's access machine learning models:

Pasted image 20250320164024.png

The model above shows that an observation is fed into the REST API (e.g django), which then communicates with the model to get a prediction.