Predict time series with python

The Complete Guide to Time Series Forecasting Using Sklearn, Pandas, and Numpy

A hands-on tutorial and framework to use any scikit-learn model for time series forecasting in Python

Introduction

There are many so-called traditional models for time series forecasting, such as the SARIMAX family of models, exponential smoothing, or BATS and TBATS.

However, very few times do we mention the most common machine learning models for regression, such as decision trees, random forests, gradient boosting, or even a support vector regressor. We see these models applied extensively in typical regression problems, but not for time series forecasting.

Hence the reason of writing this article! Here, we design a framework to frame a time series problem as a supervised learning problem, allowing us to use any model we want from our favorite library: scikit-learn!

By the end of this article, you will have the tools and knowledge to apply any machine learning model for time series forecasting along with the statistical models mentioned above.

The full source code is available on GitHub.

Learn the latest time series analysis techniques with my free time series cheat sheet in Python! Get the implementation of statistical and deep learning techniques, all in Python and TensorFlow!

Preparing the dataset

First, we import all the libraries required to complete our tutorial.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

Here, we use the statsmodels library to import the dataset, which is the weekly CO2 concentration from 1958 to 2001.

data = sm.datasets.co2.load_pandas().data

A good first step is to visualize our data with the following code block.

fig, ax = plt.subplots(figsize=(16, 11))ax.plot(data['co2'])
ax.set_xlabel('Time')…

Источник

Time Series Forecasting in Python: A Quick Practical Guide

Time Series Forecasting in Python

In the past, people used to consult shamans who would peek into what the weather would be like during the upcoming months – whether it’ll be a favorable season for crops or there would be a drought. Back in those days, when a nation’s livelihood heavily relied on the mercy of the elements, people also relied on those spiritual guides to ease their worries.

While fortune-tellers are not as highly regarded in the 21 st century, however, we’re still very much seeking accurate predictions to understand patterns in:

  • Weather conditions
  • Population movement
  • Stock prices
  • Business strategy improvement
  • Among many others.

Because we live in modern times, of course, we look to the future through entirely different means. One such means is time series forecasting. In this tutorial, we will briefly explain the idea of forecasting before using Python to make predictions based on a simple autoregressive model. We’ll also compare the results with the actual values for each period.

Without much ado, let’s cut to the chase.

What is Time Series Forecasting?

A time series is data collected over a period of time. Meanwhile, time series forecasting is an algorithm that analyzes that data, finds patterns, and draws valuable conclusions that will help us with our long-term goals.

In simpler terms, when we’re forecasting, we’re basically trying to “predict” the future. This word should sound familiar since we often hear about it on the news, be it in relation to the weather, politics, or another topic altogether. Colloquially, we use “predict” and “forecast” interchangeably – but there’s a very intricate distinction between the two.

In time series, we expect patterns to persist as we progress through time. Therefore, we follow a simple structure:

Of course, finding the pattern is just a fancy way of saying we need to select the correct model, so we’re already halfway done. All that’s left is to make the predictions.

How to Make Predictions Using Time Series Forecasting in Python?

We follow 3 main steps when making predictions using time series forecasting in Python:

Fitting the Model

Let’s assume we’ve already created a time series object and loaded our dataset into Python. In it, we should have the code for:

  • Scraping the data
  • Creating returns
  • Normalizing said returns
  • Separating the training and testing sets

Before we proceed, make sure you run the code, so we can have the data available and ready:

Running a code from a pre-loaded dataset in Python using the Run All Above command.

The first bit of coding we need to do ourselves involves fitting the model. As we mentioned earlier, we’re going to start with a simple autoregressive model and see how predictions evolve over time. We can name the variable model_ar:

Preparing to fit an autoregressive model in Python for time series forecasting purposes.

To ensure we’ll need to make minimal changes in the future, let’s use the ARIMA method instead of the ARMA. We’re also going to use FTSE values. Lastly, we must set the order to 1, 0, 0, since this is the ARIMA equivalent of the AR(1):

Fitting a time series forecasting model in Python using ARIMA, setting the order to 1,0,0.

Of course, we also need to store the fitted results before moving on:

Storing the fitted results of a time series forecasting ARIMA model in Python.

Specifying the Time Interval

Next up, we must specify the time interval for our time series forecast. It could be a day, a week, or whatever period we feel like making it. Of course, the starting date of the forecasted period is essentially the first one we don’t have values for. In other words, we’re looking for the first day after the end of the training set.

Calling the tail() method, we select the first business day following whatever the last shown date is. Let’s say the last date in our dataset is July 14, 2014 – we’ll select July 15 as our first prediction. For convenience, we will store the date in a variable called start_date:

Calling the tail method in Python and specifying a start date interval for a time series forecast.

Similarly, we can store the last date of our interval in a variable called end_date. As explained earlier, we can set it equal to any day from July 14, 2014 onwards. The longer the period is, the harder it is to closely see how the data moves between dates. For this reason, let’s go with some mid-range value. We’ll set it to January 1, 2015:

Calling the tail method in Python and specifying an end date interval for a time series forecast.

This date can be altered at any point, so long that the new one falls on a business day – otherwise, Python will throw an error message.

After setting everything up, we can finally make a forecast using the predict() method. We do this by setting the array we just created, df_pred, to be equal to the results variable called results_ar.predict on which we call predict():

Making a time series forecast with the predict method in Python, then storing the results in an array equal to the results variable.

Inside the parentheses, we set the “start” and “end” arguments to equal to the start and end dates we defined earlier:

Setting the start and end arguments of the time series forecast to equal the time interval specified earlier.

To get an idea of what our predictions look like, let’s graph them using the plot() method:

Preparing a graph for the time series predictions, using Python’s plot function.

We can also specify the color of the plotted time series by defining the argument of the same name. Conventionally, we prefer using blue for actual values and red for predicted ones, so let’s indicate that:

Defining a color argument for the plotted time series – blue for actual values and red for predictions.

By all means, remember to define the appropriate figure size and set a title. A name like “Predictions” seems fitting, so let’s set it like that:

Assigning title for the time series forecast as “Predictions” and the figure size as 24.

Analyzing the Results

After adding the plt.show() command, we can run the cell to see our results:

Running Python’s plt.show command to see the forecast results.

We see a constantly decreasing line which isn’t at all realistic. In practice, we don’t expect prices to constantly decline. That would mean that the price today is as high as it is ever going to be, and the price tomorrow will always be lower. If that was the case, sellers would always want to get rid of their investments and buyers would have an incentive to hold off before they purchase. That way, everybody would be trying to sell, but nobody would be willing to buy.

The issue here comes from our model of choice. Because we’re using a simple AR model, the predictions are only based on the constant and the prices from the previous period. Thus, we get into this constant pattern of creating a curve where every new value is just a fraction of the previous one, put on top of the constant term.

We can see the curve better if we “zoom out” a little bit. To achieve that, just expand the prediction interval – to, say, November 23, 2019 – then plot the results one more time:

Adjusting the time interval to a later date to see whether the forecasting curve will change.

The new plot shows the curve much better than the previous one, so we can verify this unrealistic decline. Now you understand why the AR model is so bad at estimating non-stationary data.

Even so, let’s have a look at how these time series predictions compare to the actual values over the same time period. Before we begin, make sure to set the “end” date back once again to January 1, 2015.

We only need to add the testing set values to the graph. In fact, we only need to add the FTSE price between the “start” and “end” periods since the rest is not relevant right now:

Adding the actual price values to forecasting code.

We can also set the color of the new plot to “blue” to ensure it is distinctly different from the “red” we use for time series forecasting:

Differentiating between the actual price values and the prediction values with the colors blue and red, respectively.

After running the cell, we see a very interesting pattern:

Running the cell in Python to reveal fluctuating actual values while the forecast remains in a descending straight line.

Over the course of the interval, actual prices cyclically jump up and down around the value we’re expecting. So, does this mean our choice of model is a good estimator for FTSE prices in the long run?

Not really. If we “zoom out” again, we’ll see how there is a trend where the values start to go up in a somewhat consistent fashion:

Readjusting the time series interval to reveal a positive ascending trend in the actual price values.

However, our prediction curve continues to decrease:

The time series forecasting trend continues to descend in a straight line.

This shows once again that AR models aren’t the best estimators of non-stationary data. For this specific case, we’d need a more complex model of time series forecasting in order to make better price predictions.

Time Series Forecasting in Python: Next Steps

While we no longer use crystal balls to predict the future, knowing what’s ahead of us is as important as ever. Using modern methods like time series forecasting is a great way to stay on top of industry trends and anticipate changes. We can not only predict what the weather would be like for the next harvest season, but also forecast the percentage of business revenue for the next quarter, stock investment trends, and more. This opens up a great expanse of career opportunities for those budding data scientists interested in analytics and future-proofing the world!

Learn data science with industry experts

Instructor at 365 Data Science

Victor holds a double degree in Mathematics and Economics from Hamilton College and The London School of Economics and Political Science. His wide range of competencies along with his warm and friendly approach to teaching, have contributed to the success of a great number of students. Victor’s list of courses include: Data Preprocessing with NumPy, Probability, and Time Series Analysis with Python.

What Is an ARMA Model?

Time Series Analysis Tutorials

What Is an Autoregressive Model?

Time Series Analysis Tutorials

What Is an Autoregressive Model?

Article image

Time Series Analysis: A Quick Introduction with Examples

Article image

Article image

How To Pre-Process Time Series Data

Article image

What Is a Moving Average Model?

Источник

Читайте также:  Html color code coding
Оцените статью