Time series forecast in python

Содержание

Saved searches
Use saved searches to filter your results more quickly
jiwidi/time-series-forecasting-with-python
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About
Time Series Forecasting in Python: A Quick Practical Guide
What is Time Series Forecasting?
How to Make Predictions Using Time Series Forecasting in Python?
Fitting the Model
Specifying the Time Interval
Analyzing the Results
Time Series Forecasting in Python: Next Steps

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A use-case focused tutorial for time series forecasting with python

jiwidi/time-series-forecasting-with-python

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This repository contains a series of analysis, transforms and forecasting models frequently used when dealing with time series. The aim of this repository is to showcase how to model time series from the scratch, for this we are using a real usecase dataset (Beijing air polution dataset to avoid perfect use cases far from reality that are often present in this types of tutorials. If you want to rerun the notebooks make sure you install al neccesary dependencies, Guide

You can find the more detailed toc on the main notebook

The dataset used is the Beijing air quality public dataset. This dataset contains polution data from 2014 to 2019 sampled every 10 minutes along with extra weather features such as preassure, temperature etc. We decided to resample the dataset with daily frequency for both easier data handling and proximity to a real use case scenario (no one would build a model to predict polution 10 minutes ahead, 1 day ahead looks more realistic). In this case the series is already stationary with some small seasonalities which change every year #MORE ONTHIS

In order to obtain a exact copy of the dataset used in this tutorial please run the script under datasets/download_datasets.py which will automatically download the dataset and preprocess it for you.

Time series decomposition

Level
Trend
Seasonality
Noise

AC and PAC plots
Rolling mean and std
Dickey-Fuller test

Difference transform
Log scale
Smoothing
Moving average

Autoregression (AR)
Moving Average (MA)
Autoregressive Moving Average (ARMA)
Autoregressive integraded moving average (ARIMA)
Seasonal autoregressive integrated moving average (SARIMA)
Bayesian regression Link
Lasso Link
SVM Link
Randomforest Link
Nearest neighbors Link
XGBoost Link
Lightgbm Link
Prophet Link
Long short-term memory with tensorflow (LSTM)Link
DeepAR

We will devide our results wether the extra features columns such as temperature or preassure were used by the model as this is a huge step in metrics and represents two different scenarios. Metrics used were:

Additional resources and literature

Models not tested but that are gaining popularity

There are several models we have not tried in this tutorials as they come from the academic world and their implementation is not 100% reliable, but is worth mentioning them:

Neural basis expansion analysis for interpretable time series forecasting (N-BEATS) | linkCode
ESRRN linkCode

Adhikari, R., & Agrawal, R. K. (2013). An introductory study on time series modeling and forecasting	[1]
Introduction to Time Series Forecasting With Python	[2]
Deep Learning for Time Series Forecasting	[3]
The Complete Guide to Time Series Analysis and Forecasting	[4]
How to Decompose Time Series Data into Trend and Seasonality	[5]

Want to see another model tested? Do you have anything to add or fix? I’ll be happy to talk about it! Open an issue/PR 🙂

About

A use-case focused tutorial for time series forecasting with python

Источник

Time Series Forecasting in Python: A Quick Practical Guide

In the past, people used to consult shamans who would peek into what the weather would be like during the upcoming months – whether it’ll be a favorable season for crops or there would be a drought. Back in those days, when a nation’s livelihood heavily relied on the mercy of the elements, people also relied on those spiritual guides to ease their worries.

While fortune-tellers are not as highly regarded in the 21 st century, however, we’re still very much seeking accurate predictions to understand patterns in:

Weather conditions
Population movement
Stock prices
Business strategy improvement
Among many others.

Because we live in modern times, of course, we look to the future through entirely different means. One such means is time series forecasting. In this tutorial, we will briefly explain the idea of forecasting before using Python to make predictions based on a simple autoregressive model. We’ll also compare the results with the actual values for each period.

Without much ado, let’s cut to the chase.

What is Time Series Forecasting?

A time series is data collected over a period of time. Meanwhile, time series forecasting is an algorithm that analyzes that data, finds patterns, and draws valuable conclusions that will help us with our long-term goals.

In simpler terms, when we’re forecasting, we’re basically trying to “predict” the future. This word should sound familiar since we often hear about it on the news, be it in relation to the weather, politics, or another topic altogether. Colloquially, we use “predict” and “forecast” interchangeably – but there’s a very intricate distinction between the two.

In time series, we expect patterns to persist as we progress through time. Therefore, we follow a simple structure:

Of course, finding the pattern is just a fancy way of saying we need to select the correct model, so we’re already halfway done. All that’s left is to make the predictions.

How to Make Predictions Using Time Series Forecasting in Python?

We follow 3 main steps when making predictions using time series forecasting in Python:

Fitting the Model

Let’s assume we’ve already created a time series object and loaded our dataset into Python. In it, we should have the code for:

Scraping the data
Creating returns
Normalizing said returns
Separating the training and testing sets

Before we proceed, make sure you run the code, so we can have the data available and ready:

The first bit of coding we need to do ourselves involves fitting the model. As we mentioned earlier, we’re going to start with a simple autoregressive model and see how predictions evolve over time. We can name the variable model_ar:

To ensure we’ll need to make minimal changes in the future, let’s use the ARIMA method instead of the ARMA. We’re also going to use FTSE values. Lastly, we must set the order to 1, 0, 0, since this is the ARIMA equivalent of the AR(1):

Of course, we also need to store the fitted results before moving on:

Specifying the Time Interval

Next up, we must specify the time interval for our time series forecast. It could be a day, a week, or whatever period we feel like making it. Of course, the starting date of the forecasted period is essentially the first one we don’t have values for. In other words, we’re looking for the first day after the end of the training set.

Calling the tail() method, we select the first business day following whatever the last shown date is. Let’s say the last date in our dataset is July 14, 2014 – we’ll select July 15 as our first prediction. For convenience, we will store the date in a variable called start_date:

Similarly, we can store the last date of our interval in a variable called end_date. As explained earlier, we can set it equal to any day from July 14, 2014 onwards. The longer the period is, the harder it is to closely see how the data moves between dates. For this reason, let’s go with some mid-range value. We’ll set it to January 1, 2015:

This date can be altered at any point, so long that the new one falls on a business day – otherwise, Python will throw an error message.

After setting everything up, we can finally make a forecast using the predict() method. We do this by setting the array we just created, df_pred, to be equal to the results variable called results_ar.predict on which we call predict():

Inside the parentheses, we set the “start” and “end” arguments to equal to the start and end dates we defined earlier:

To get an idea of what our predictions look like, let’s graph them using the plot() method:

We can also specify the color of the plotted time series by defining the argument of the same name. Conventionally, we prefer using blue for actual values and red for predicted ones, so let’s indicate that:

By all means, remember to define the appropriate figure size and set a title. A name like “Predictions” seems fitting, so let’s set it like that:

Analyzing the Results

After adding the plt.show() command, we can run the cell to see our results:

We see a constantly decreasing line which isn’t at all realistic. In practice, we don’t expect prices to constantly decline. That would mean that the price today is as high as it is ever going to be, and the price tomorrow will always be lower. If that was the case, sellers would always want to get rid of their investments and buyers would have an incentive to hold off before they purchase. That way, everybody would be trying to sell, but nobody would be willing to buy.

The issue here comes from our model of choice. Because we’re using a simple AR model, the predictions are only based on the constant and the prices from the previous period. Thus, we get into this constant pattern of creating a curve where every new value is just a fraction of the previous one, put on top of the constant term.

We can see the curve better if we “zoom out” a little bit. To achieve that, just expand the prediction interval – to, say, November 23, 2019 – then plot the results one more time:

The new plot shows the curve much better than the previous one, so we can verify this unrealistic decline. Now you understand why the AR model is so bad at estimating non-stationary data.

Even so, let’s have a look at how these time series predictions compare to the actual values over the same time period. Before we begin, make sure to set the “end” date back once again to January 1, 2015.

We only need to add the testing set values to the graph. In fact, we only need to add the FTSE price between the “start” and “end” periods since the rest is not relevant right now:

We can also set the color of the new plot to “blue” to ensure it is distinctly different from the “red” we use for time series forecasting:

After running the cell, we see a very interesting pattern:

Over the course of the interval, actual prices cyclically jump up and down around the value we’re expecting. So, does this mean our choice of model is a good estimator for FTSE prices in the long run?

Not really. If we “zoom out” again, we’ll see how there is a trend where the values start to go up in a somewhat consistent fashion:

However, our prediction curve continues to decrease:

This shows once again that AR models aren’t the best estimators of non-stationary data. For this specific case, we’d need a more complex model of time series forecasting in order to make better price predictions.

Time Series Forecasting in Python: Next Steps

While we no longer use crystal balls to predict the future, knowing what’s ahead of us is as important as ever. Using modern methods like time series forecasting is a great way to stay on top of industry trends and anticipate changes. We can not only predict what the weather would be like for the next harvest season, but also forecast the percentage of business revenue for the next quarter, stock investment trends, and more. This opens up a great expanse of career opportunities for those budding data scientists interested in analytics and future-proofing the world!

Learn data science with industry experts

Instructor at 365 Data Science

Victor holds a double degree in Mathematics and Economics from Hamilton College and The London School of Economics and Political Science. His wide range of competencies along with his warm and friendly approach to teaching, have contributed to the success of a great number of students. Victor’s list of courses include: Data Preprocessing with NumPy, Probability, and Time Series Analysis with Python.