Live Notebook

You can run this notebook in a live session or view it on Github.

# Time Series Forecasting¶

This example shows using Prophet and Dask for scalable time series forecasting.

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.

As discussed in the Forecasting at scale, large datasets aren’t the only type of scaling challenge teams run into. In this example we’ll focus on the third type of scaling challenege indentified in that paper:

[I]n most realistic settings, a large number of forecasts will be created, necessitating efficient, automated means of evaluating and comparing them, as well as detecting when they are likely to be performing poorly. When hundreds or even thousands of forecasts are made, it becomes important to let machines do the hard work of model evaluation and comparison while efficiently using human feedback to fix performance problems.

That sounds like a perfect opportunity for Dask. We’ll use Prophet and Dask together to parallize the diagnostics stage of research. It does not attempt to parallize the training of the model itself.

[1]:

# This example currently relies on Prophet master
import subprocess

subprocess.call([
"pip",
"install",
])

[1]:

0

[2]:

import pandas as pd
from fbprophet import Prophet

/home/travis/miniconda/envs/test/lib/python3.7/site-packages/fbprophet/diagnostics.py:10: TqdmExperimentalWarning: Using tqdm.autonotebook.tqdm in notebook mode. Use tqdm.tqdm instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
Importing plotly failed. Interactive plots will not work.


We’ll walk through the example from the Prophet quickstart. These values represent the log daily page views for Peyton Manning’s wikipedia page.

[3]:

df = pd.read_csv(
parse_dates=['ds']
)

[3]:

ds y
0 2007-12-10 9.590761
1 2007-12-11 8.519590
2 2007-12-12 8.183677
3 2007-12-13 8.072467
4 2007-12-14 7.893572
[4]:

df.plot(x='ds', y='y');


Fitting the model takes a handful of seconds. Dask isn’t involved at all here.

[5]:

%%time
m = Prophet(daily_seasonality=False)
m.fit(df)

CPU times: user 2.59 s, sys: 149 ms, total: 2.74 s
Wall time: 3.01 s

[5]:

<fbprophet.forecaster.Prophet at 0x7f5b82146610>


And we can make a forecast. Again, Dask isn’t involved here.

[6]:

future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
m.plot(forecast);


## Parallel Diagnostics¶

Prophet includes a fbprophet.diagnostics.cross_validation function method, which uses simulated historical forecasts to provide some idea of a model’s quality.

This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. We can then compare the forecasted values to the actual values.

Internally, cross_validation generates a list of cutoff values to try. Prophet fits a model and computes some metrics for each of these. By default each model is fit sequentially, but the models can be trained in parallel using the parallel= keyword. On a single machine parallel="processes" is a good choice. For large problems where you’d like to distribute the work on a cluster, use parallel="dask" after you’ve connected to the cluster by creating a Client.

[7]:

import dask
from distributed import Client, performance_report
import fbprophet.diagnostics

client

[7]:


### Cluster

• Workers: 2
• Cores: 2
• Memory: 8.36 GB
[8]:

%%time
df_cv = fbprophet.diagnostics.cross_validation(
m, initial="730 days", period="180 days", horizon="365 days",

INFO:fbprophet:Making 11 forecasts with cutoffs between 2010-02-15 00:00:00 and 2015-01-20 00:00:00

CPU times: user 2.91 s, sys: 263 ms, total: 3.17 s