Scaling AI Case Study, UBER

Edited from UBER’s Engineering blog by Jeremy Hermann, Engineering Manager and Mike Del Balso, Product Manager on UBER’s machine learning Platform team.

The full technical study is published here – https://eng.UBER.com/michelangelo/

UBER are an AI-powered company that pushes the limits of scale. Their greatest challenge was to integrate machine learning processes across the company which meant pulling together data processing, machine learning modelling, visualisation and deployment tools into a single integrated cross-company package.

To meet this demand, UBER built Michelangelo - an internal ‘machine learning-as-a-service platform’ designed to make it easy to scale AI solutions to meet the needs of business.

Michelangelo enables internal teams to seamlessly build, deploy, and operate machine learning solutions at-scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. The system also supports traditional ML models, time series forecasting, and Deep Learning.

Michelangelo is deployed across several UBER data centres, leverages specialized hardware, and serves predictions for the highest loaded online services at the company.

Before Michelangelo, UBER faced a number of scale-related challenges with building and deploying machine learning models.

Prior to Michelangelo, there were no systems in place to build reliable, uniform, and reproducible pipelines for creating and managing training and prediction data at scale. It was not possible to train models larger than what would fit on data scientists’ desktop machines, and there was neither a standard place to store the results of training experiments nor an easy way to compare one experiment to another. Most importantly, there was no established path to deploying a model into production. In most cases, the relevant engineering team had to create a custom serving container specific to the project at hand.

Michelangelo was designed to address these gaps by standardizing the workflows and tools across teams though an end-to-end system that could grow with the business and enable users to easily build and operate machine learning systems at scale.

How Michelangelo been used to build and deploy models to solve specific problems at UBER? The platform manages dozens of models across the company for a variety of prediction use cases. Take UBEREATS for example.

Use case: UBEREATS estimated time of delivery model

UBEREATS has several models running on Michelangelo, covering meal delivery time predictions, search rankings, search autocomplete, and restaurant rankings. The delivery time models predict how much time a meal will take to prepare and deliver before the order is issued and then again at each stage of the delivery process.

The UBEREATS app hosts an estimated delivery time feature powered by machine learning models built on Michelangelo.

Predicting meal estimated time of delivery (ETD) is not simple. When an UBEREATS customer places an order it is sent to the restaurant for processing. The restaurant then needs to acknowledge the order and prepare the meal which will take time depending on the complexity of the order and how busy the restaurant is. When the meal is close to being ready, an UBER delivery-partner is dispatched to pick up the meal. Then, the delivery-partner needs to get to the restaurant, find parking, walk inside to get the food, then walk back to the car, drive to the customer’s location (which depends on route, traffic, and other factors), find parking, and walk to the customer’s door to complete the delivery. The goal is to predict the total duration of this complex multi-stage process, as well as recalculate these time-to-delivery predictions at every step of the process.

On the Michelangelo platform, the UBEREATS data scientists used regression models to predict this end-to-end delivery time. Features for the model include information from the request (e.g., time of day, delivery location), historical features (e.g. average meal prep time for the last seven days), and near realtime calculated features (e.g., average meal prep time for the last one hour). These predictions are displayed to UBEREATS customers prior to ordering from a restaurant and as their meal is being prepared and delivered.

machine learning workflow

The same general workflow exists across almost all machine learning use cases at UBER regardless of the challenge at hand, including classification and regression, as well as time series forecasting. The workflow applies across different deployment modes such as both online and offline (and in-car and in-phone) prediction use cases.

Michelangelo is specifically designed to address the following six-step workflow:

Manage data
Train models
Evaluate models
Deploy models
Make predictions
Monitor predictions

Let’s now explore how Michelangelo facilitates each stage of this workflow.

Manage data

Finding good features in data is often the hardest part of machine learning and UBER have found that building and managing data pipelines is typically one of the costliest pieces of a machine learning solution.

A platform should provide standard tools for building data pipelines to generate features and label data sets for training (and re-training), and feature-only data sets for predicting. These tools should have deep integration with the company’s data lake or warehouses and with the company’s online data serving systems.

The pipelines need to be scalable, incorporate integrated monitoring for data flow and data quality, and support both online and offline training and predicting. Ideally, they should also generate the features in a way that is shareable across teams to reduce duplicate work and increase data quality. Users should adopt best practices such as guaranteeing that the same data generation/preparation process is used at both training time and prediction time.

UBER found that many modelling problems use identical or similar features, and there is substantial value in enabling teams to share features between their own projects, and for teams in different organizations to share features with each other.

Train models

Michelangelo currently supports multiple machine learning model types, including Deep Learning. UBER lets customer teams add their own model types. The distributed model training system scales-up to handle billions of samples and down to small datasets for quick iterations.

After a model is trained, performance metrics are computed and combined into a model evaluation report. At the end of training, the original configuration, the learned parameters, and the evaluation report are saved back to the model repository for analysis and deployment.

Training jobs can be configured and managed through a web UI or an API.

Evaluate models

Before arriving at the ideal model for a given use case, it is not uncommon to train hundreds of models that do not make the cut. Though not ultimately used in production, the performance of these rejected models guide engineers towards the model configuration that results in the best model performance.

Keeping track of these trained models, evaluating them, and comparing them to each other are typically big challenges when dealing with so many models. So Michelangelo stores records of:

Who trained the model
Training and test data sets
Distribution and relative importance of each feature
Model accuracy metrics
Summary statistics for model visualization

Different model types can be explored with powerful visualizations.

Deploy models

Michelangelo has end-to-end support for managing model deployment via its UI (user interface) or API (application programming interface). Models from the model repository are deployed to online and offline containers - standalone, executable packages of software that includes everything needed to run an application.

In all cases, the required model components are packaged in a ZIP archive and copied to the relevant hosts across UBER’s data centres using UBER’s standard code deployment infrastructure. The prediction containers automatically load the new models and start handling prediction requests.

Make predictions

Once models are deployed and loaded by the serving container, they are used to make predictions based on feature data loaded from a data pipeline or directly from a client service. Offline predictions are batch processed to update models. Online predications are returned to the client service over the network.

The highest traffic models produce more than 250,000 predictions per second.

Monitor predictions

Models are trained and evaluated on historical data. To make sure that a model is working well into the future, it is critical to monitor its predictions so as to ensure that the data pipelines are continuing to send accurate data and that production environment has not changed such that the model is no longer accurate.

To address this, Michelangelo can automatically log and optionally hold back a percentage of the predictions that it makes and then later join those predictions to the observed outcomes (or labels) generated by the data pipeline.

Michelangelo – an end-to-end, Enterprise-wide machine learning platform

Management

The last important piece of the system is an API tier consisting of a management application that controls integrations with UBER’s system monitoring and alerting infrastructure. This tier also houses the workflow system that is used to orchestrate the batch data pipelines, training jobs, batch prediction jobs, and the deployment of models both to batch and online containers.

Building on the Michelangelo platform

UBER is planning the following developments to Michelangelo:

AutoML for automatically searching and discovering model configurations that result in the best performing models for given modelling problems.
Model visualization. Understanding and debugging models is increasingly important – especially for Deep Learning. Model visualisation tools are needed to enable data scientists to understand, debug, and tune their models and for users to trust the results.
Online learning. Most of UBER’s machine learning models directly affect the UBER product in real time. This means they operate in the complex and ever-changing environment of moving things in the physical world. To keep models accurate as this environment changes, models need to change with it. This involves easily updateable model types, faster training and evaluation architecture and pipelines, automated model validation and deployment, and sophisticated monitoring and alerting systems.
Distributed Deep Learning. An increasing number of UBER’s machine learning systems are implementing Deep Learning technologies. The user workflow of defining and iterating on Deep Learning models is sufficiently different from the standard workflow such that it needs unique platform support. Deep Learning use cases typically handle a larger quantity of data, and different hardware requirements (i.e. Graphical Processing Units).

Acknowledgement – UBER Engineering.

Complete and Continue