Machine Learning Platform

Discover the engineering work in iFood's data science

Daniel Galinkin

💻 Data & AI - Head of ML Engineering at iFood

A machine learning platform is the set of tools, processes and flows built with the following objectives:

  • Make the process of developing and training a machine learning model faster, more reliable, and more reproducible.
  • Make the process of providing a machine learning model for systems in production reliable, scalable and traceable.
  • Make the process of testing, monitoring and evaluating a machine learning model in production transparent, easily accessible and in a standardized way.

In short, it is engineering applied to data science.

state of art

The area of developing machine learning platforms, or ML Engineering, is a young area with no defined definitive standards.

In more mature areas, such as distributed data processing, it is possible to list widely adopted and open source solutions, such as Apache Spark, which has large companies assisting in its development and advancing the frontier of research while defining new standards.

However, when it comes to the state of the art of ML Engineering, the answer is less definitive.

Open source components

There are now some tools that address parts of the problem, such as:

  • MLFlow: deals with the process of developing and recording machine learning model experiments.
  • AWS Sagemaker: service Amazon Web Services which simplifies the execution of model training in a distributed way and the generation of microservices based on these models.
  • GoJek Feastfeature store developed by GoJek. It deals with the process of generating, processing and making attributes available for consumption by machine learning models.

Each of these tools addresses part of the challenge of getting and maintaining a machine learning model into production. Building a machine learning platform is more complicated than simply combining them: there are usually business requirements, incompatibilities between them and gaps in functionality that must be addressed.

Complete platforms

There are also platforms that cover all parts of the problem developed by companies for internal use. These platforms, however, are not available for use, not even for commercial use. Some examples are:

Here at iFood, we are developing and working on our internal machine learning platform solution, taking into account the state of the art, both open source tools and proprietary solutions.

To this end, it is important to gather requirements and meet the needs of the business and our customers, the data scientists.

Requirements of a Machine Learning Platform

There are two main points when building a machine learning platform: MLOps It is Feature Store.

MLOps

the term MLOps is an abbreviation for Machine Learning Operations, or operationalization of machine learning, in free translation. The term is derived from the more popular and adopted concept of DevOps, which is the set of techniques and processes to automate the delivery of software with a high quality standard for production.

MLOps, therefore, is the set of techniques and processes to automate the delivery of machine learning models with high quality standards for production.

The main points that must be met for this to occur are:

  1. Generate training datasets
  2. Train models
  3. Put models into production (deploy)
  4. Monitor models

1. Generate training datasets

DevOps The traditional version is based on code versions: once a new version of the code passes all necessary validations, it can be used to update the program running in production.

In case of MLOps, there are two dependencies: code version and training data version, both of which must be properly validated. A machine learning platform should automatically run tests whenever necessary.

2. Train models

Once you have new, valid versions of the code and training data, you need to generate a new model. This, however, is a potentially costly process that can require a lot of time and computational resources. A machine learning platform must abstract the allocation of these resources to the data scientist.

3. Put models into production

Once a new model is generated, it is necessary to make it available for consumption in an efficient, robust and scalable way to meet business demands. A machine learning platform must abstract the challenges of these requirements for the data scientist.

4. Monitor models

Once the model is being used in production, it is necessary to monitor whether its behavior is as expected.

For example, the domain of data being used for predictions may change over time (a concept known as concept drift or data drift).

A machine learning platform must be able to generate alerts in these cases, compare model versions and automatically detect regressions.

Feature Store

Building a machine learning model is based on features, or attributes. A model that determines the preparation time for a dish in a restaurant can be based, for example, on the number of dishes being prepared at a given time. This number would be a feature for this model.

Feature store is the service responsible for calculating and storing these attributes, and making them available for consumption in the model training and prediction phases.

One feature store complete system has, among others, the following requirements:

  • Low latency access to attributes calculated in real time.
  • Efficient access and search of historical attributes to generate datasets for model training.
  • Ease of discovery and reuse of existing attributes.
  • Scalability to handle data with large volumes, variability and speed of arrival.

A machine learning platform must have a feature store that meets these requirements and is easily accessible in all phases of model development.

For example, to build a model to predict the time required to deliver an order, examples of important attributes are:

  • The restaurant's average preparation time
  • How overloaded the restaurant kitchen is at the time of ordering
  • The time of the order and whether this corresponds to a peak traffic period in the region
  • Whether it's raining or not

feature store is responsible for storing historical data of these attributes and making them available for model training. In the case of iFood, this data is of large volume, and this access needs to be done in an efficient and distributed way.

Furthermore, it is also responsible for making the generated attributes available in real time, so that when a user makes a request, the model is able to consult the values in real time and with low latency to provide an estimate as soon as possible. After all, we don't want to wait any longer when we're hungry, do we?

Conclusion

Developing machine learning platforms is a new area full of intriguing, cutting-edge technical challenges. Knowledge of machine learning is required, yes, but also software engineering, big dataDevOps and cloud computing.

Standards are being defined and matured by companies with the volume of data that requires creativity, innovation and engineering to meet the necessary challenges and requirements, and are constantly evolving.

Here at iFood, our machine learning platform is constantly developing and evolving, and we will soon be one of the references in this area!

Have you ever thought about working with us and contributing to our platform?

Additional references

Was this content useful to you?
YesNo

Related posts