AI Forecasting for Amazon

How does AI change the science of forecasting and what new capabilities and limitations does it bring ?

Forecasting is a key capability across all types of organisations and can translate directly into financial benefits through better process optimisation, resource management, products, services and marketing. More widely it provides wide ranging benefits such as predicting food, water and energy demand, and climate change and its effects.

AI has long been used in forecasting, but new methods are improving the science behind it.

To explore this further, lets first look at a case study.

How does Amazon forecast with accuracy and granularity customer demand for 50 million products?
How can the company increase forecast granularity in terms of products, categories, time and horizons?
How can Amazon forecast demand for a new product with no prior sales data?

According to David Salinas, Senior machine learning Scientist at Amazon the answer lies in Deep Learning.

Figure 30. Simple vs Deep Learning neural networks. Image Picture attribution - https://towardsdatascience.com

The main problem that Amazon had to solve was the ratio of time series models per data scientist at the company. Each of Amazon’s 50 million products have their own demand models, so without AI, 50m models would need to be split across the company’s data science staff, leaving each data scientist with unmanageable numbers of models to deal with.

Even if this enormous number of forecast models were manageable, there’s the problem of aggregation - how do you roll-up and drill down into 50m separate demand models?

The answer is to to bring all 50m models into a single holistic structure using each model to collectively inform all the other models and gives users the ability to analyse the system as a whole as well as drill down into product level detail

So let’s simplify the science behind this, starting with some simple time series spreadsheets.

Forecasting here means estimating the probability distribution of a time series’ future given its past.

In a time series, the y axis carries a value, and the x axis represents time intervals along which that value occurs. Below is an example of a simple time series in Excel that shows product unit demand on the y axis and historic 1-year intervals on the x axis.

Figure 31. Units of demand

From this, it’s tempting to extrapolate that demand will be 6 units in 2019.

Figure 32. Demand forecast

But we rarely work with perfectly smooth trends, which is where Excel’s FORECAST function can help us predict fluctuating trends.

Figure 33. Demand forecast based on history

Excel also lets us add confidence intervals (levels).

Figure 34. Demand forecast with confidence levels

We can add more products as well.

Figure 35. Forecasting demand for more than one product

The example above shows forecasts for just two products, but imagine a plot showing forecasts for 50m products!

In the Amazon model, each individual product forecast model collectively informs all the other models, so they need to factor-in the effects of wildly different frequency of sales for different products. In a large market place like Amazon this would likely be a ‘Long Tail’ (Power Law) effect - ie a small subset of products that significantly outsell most other products in the marketplace. In other words, how could you ensure that the high frequency sales of top selling commodities doesn't have a disproportionate effect on forecasts for a niche product ?

Figure 36. The Long Tail - common in high volume and high variance data sets.

We need to consider seasonality and other predictable influences, but there are inevitably going to be unforeseen influencing factors as well.

Neural networks often apply in scenarios like this because they are able to learn from examples and are able to catch hidden and non-linear dependencies.

The approach taken by Amazon is to run each demand forecast model through a neural network to build a ’master-model’ that 'transfers learning' all its constituent models.

Figure 37. DeepAR simplified

In this approach multiple time series models are hierarchically organised and can be aggregated at several levels of product categories. These “hierarchical time series” can be forecast “bottom-up” or “top-down”.

The method that Amazon uses to achieve this is DeepAR - Deep Autoregressive Recurrent Networks.

Autoregression is a time series model that uses observations from previous time steps as input to a equation to predict the value at the next time step. Products are categorised, and learn from similar products as well as the master model. The resulting model gives a high degree of forecast accuracy that can even predict demand for new products with no sales history.

The example below shows a plot from a model built using using historical data. The x axis show time, and the y axis shows products sold. Left of the green vertical line is the training data. To the right of the green vertical line is the data generated by the DeepAR model. The black line shows actual sales (the target for the model); the light blue line shows what the model predicted; and the shaded blue area shows the upper and lower confidence intervals.

Figure 38. DeepAR output. Image source https://www.groundai.com/project/deepar-probabilistic-forecasting-with-autoregressive-recurrent-networks/

https://www.groundai.com/project/deepar-probabilistic-forecasting-with-autoregressive-recurrent-networks/

This method can be used for scenarios as diverse as energy consumption of individual households, greenhouse optimisation, loads for servers in a data centre, and staff scheduling.

So, can we extrapolate from this to say that the problem of large-scale, accurate and granular forecasts is solved?

The short answer to this is ‘no’.

Forecasting is hard in many cases because there may be too many factors or hidden/unknown factors influencing the outcome.

Take the question of whether we can now predict financial market futures for example?

Every financial crisis is caused by different factors, so an approach based on working out the probability of a future given its past can only, at best, work partially.

The problem domain is critical as is the availability of data on influencing factors. Whilst Amazon is a massive marketplace, it is clearly bounded and it generates data on those aspects of its operations that are relevant to making forecasts. Systems like the financial markets are very different with different kinds of boundaries, magnitudes and input features. Many of their influencing features are highly intangible - eg the rise of populism and its effects on global trade, or the effects of rising temperatures on macro-economic activity.

Yes, AI gives forecasters increasingly powerful tools. However, forecast accuracy depends on data first. Regardless of how powerful your analytical tools, a core forecasting skill is being able to pick and weigh the right data input features. Paul Saffo writing in the July-August issue of the The Harvard Business Review proposes a dynamic “cone of uncertainty” against which a forecaster needs to pick the right input features along a continuum between the highly probable and the wildly impossible.

So, the forecast for the future of forecasting is that AI should continue to make some kinds of forecasting more accurate and granular. AI may also replace the kind of modelling work done by humans that can be subsumed into master models. However, humans are still essential for defining forecast problems, defining and weighing inputs, selecting and preparing data, and building and running forecasting models.

My hope - and forecast - is that powerful forecasting tools will become more widely accessible and put accurate forecasting into the hands of non-experts.

Thanks to Centre for Marketing Analytics and Forecasting for their Forecasting with Artificial Intelligence workshop in London, 26th October 2018.