Time series forecast in python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A use-case focused tutorial for time series forecasting with python

jiwidi/time-series-forecasting-with-python

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This repository contains a series of analysis, transforms and forecasting models frequently used when dealing with time series. The aim of this repository is to showcase how to model time series from the scratch, for this we are using a real usecase dataset (Beijing air polution dataset to avoid perfect use cases far from reality that are often present in this types of tutorials. If you want to rerun the notebooks make sure you install al neccesary dependencies, Guide

You can find the more detailed toc on the main notebook

The dataset used is the Beijing air quality public dataset. This dataset contains polution data from 2014 to 2019 sampled every 10 minutes along with extra weather features such as preassure, temperature etc. We decided to resample the dataset with daily frequency for both easier data handling and proximity to a real use case scenario (no one would build a model to predict polution 10 minutes ahead, 1 day ahead looks more realistic). In this case the series is already stationary with some small seasonalities which change every year #MORE ONTHIS

In order to obtain a exact copy of the dataset used in this tutorial please run the script under datasets/download_datasets.py which will automatically download the dataset and preprocess it for you.

  • Time series decomposition
    • Level
    • Trend
    • Seasonality
    • Noise
    • AC and PAC plots
    • Rolling mean and std
    • Dickey-Fuller test
    • Difference transform
    • Log scale
    • Smoothing
    • Moving average
    • Autoregression (AR)
    • Moving Average (MA)
    • Autoregressive Moving Average (ARMA)
    • Autoregressive integraded moving average (ARIMA)
    • Seasonal autoregressive integrated moving average (SARIMA)
    • Bayesian regression Link
    • Lasso Link
    • SVM Link
    • Randomforest Link
    • Nearest neighbors Link
    • XGBoost Link
    • Lightgbm Link
    • Prophet Link
    • Long short-term memory with tensorflow (LSTM)Link
    • DeepAR

    We will devide our results wether the extra features columns such as temperature or preassure were used by the model as this is a huge step in metrics and represents two different scenarios. Metrics used were:

    Additional resources and literature

    Models not tested but that are gaining popularity

    There are several models we have not tried in this tutorials as they come from the academic world and their implementation is not 100% reliable, but is worth mentioning them:

    • Neural basis expansion analysis for interpretable time series forecasting (N-BEATS) | linkCode
    • ESRRN linkCode
    Adhikari, R., & Agrawal, R. K. (2013). An introductory study on time series modeling and forecasting [1]
    Introduction to Time Series Forecasting With Python [2]
    Deep Learning for Time Series Forecasting [3]
    The Complete Guide to Time Series Analysis and Forecasting [4]
    How to Decompose Time Series Data into Trend and Seasonality [5]

    Want to see another model tested? Do you have anything to add or fix? I’ll be happy to talk about it! Open an issue/PR 🙂

    About

    A use-case focused tutorial for time series forecasting with python

    Источник

    Time Series Forecasting in Python: A Quick Practical Guide

    Time Series Forecasting in Python

    In the past, people used to consult shamans who would peek into what the weather would be like during the upcoming months – whether it’ll be a favorable season for crops or there would be a drought. Back in those days, when a nation’s livelihood heavily relied on the mercy of the elements, people also relied on those spiritual guides to ease their worries.

    While fortune-tellers are not as highly regarded in the 21 st century, however, we’re still very much seeking accurate predictions to understand patterns in:

    • Weather conditions
    • Population movement
    • Stock prices
    • Business strategy improvement
    • Among many others.

    Because we live in modern times, of course, we look to the future through entirely different means. One such means is time series forecasting. In this tutorial, we will briefly explain the idea of forecasting before using Python to make predictions based on a simple autoregressive model. We’ll also compare the results with the actual values for each period.

    Without much ado, let’s cut to the chase.

    What is Time Series Forecasting?

    A time series is data collected over a period of time. Meanwhile, time series forecasting is an algorithm that analyzes that data, finds patterns, and draws valuable conclusions that will help us with our long-term goals.

    In simpler terms, when we’re forecasting, we’re basically trying to “predict” the future. This word should sound familiar since we often hear about it on the news, be it in relation to the weather, politics, or another topic altogether. Colloquially, we use “predict” and “forecast” interchangeably – but there’s a very intricate distinction between the two.

    In time series, we expect patterns to persist as we progress through time. Therefore, we follow a simple structure:

    Of course, finding the pattern is just a fancy way of saying we need to select the correct model, so we’re already halfway done. All that’s left is to make the predictions.

    How to Make Predictions Using Time Series Forecasting in Python?

    We follow 3 main steps when making predictions using time series forecasting in Python:

    Fitting the Model

    Let’s assume we’ve already created a time series object and loaded our dataset into Python. In it, we should have the code for:

    • Scraping the data
    • Creating returns
    • Normalizing said returns
    • Separating the training and testing sets

    Before we proceed, make sure you run the code, so we can have the data available and ready:

    Running a code from a pre-loaded dataset in Python using the Run All Above command.

    The first bit of coding we need to do ourselves involves fitting the model. As we mentioned earlier, we’re going to start with a simple autoregressive model and see how predictions evolve over time. We can name the variable model_ar:

    Preparing to fit an autoregressive model in Python for time series forecasting purposes.

    To ensure we’ll need to make minimal changes in the future, let’s use the ARIMA method instead of the ARMA. We’re also going to use FTSE values. Lastly, we must set the order to 1, 0, 0, since this is the ARIMA equivalent of the AR(1):

    Fitting a time series forecasting model in Python using ARIMA, setting the order to 1,0,0.

    Of course, we also need to store the fitted results before moving on:

    Storing the fitted results of a time series forecasting ARIMA model in Python.

    Specifying the Time Interval

    Next up, we must specify the time interval for our time series forecast. It could be a day, a week, or whatever period we feel like making it. Of course, the starting date of the forecasted period is essentially the first one we don’t have values for. In other words, we’re looking for the first day after the end of the training set.

    Calling the tail() method, we select the first business day following whatever the last shown date is. Let’s say the last date in our dataset is July 14, 2014 – we’ll select July 15 as our first prediction. For convenience, we will store the date in a variable called start_date:

    Calling the tail method in Python and specifying a start date interval for a time series forecast.

    Similarly, we can store the last date of our interval in a variable called end_date. As explained earlier, we can set it equal to any day from July 14, 2014 onwards. The longer the period is, the harder it is to closely see how the data moves between dates. For this reason, let’s go with some mid-range value. We’ll set it to January 1, 2015:

    Calling the tail method in Python and specifying an end date interval for a time series forecast.

    This date can be altered at any point, so long that the new one falls on a business day – otherwise, Python will throw an error message.

    After setting everything up, we can finally make a forecast using the predict() method. We do this by setting the array we just created, df_pred, to be equal to the results variable called results_ar.predict on which we call predict():

    Making a time series forecast with the predict method in Python, then storing the results in an array equal to the results variable.

    Inside the parentheses, we set the “start” and “end” arguments to equal to the start and end dates we defined earlier:

    Setting the start and end arguments of the time series forecast to equal the time interval specified earlier.

    To get an idea of what our predictions look like, let’s graph them using the plot() method:

    Preparing a graph for the time series predictions, using Python’s plot function.

    We can also specify the color of the plotted time series by defining the argument of the same name. Conventionally, we prefer using blue for actual values and red for predicted ones, so let’s indicate that:

    Defining a color argument for the plotted time series – blue for actual values and red for predictions.

    By all means, remember to define the appropriate figure size and set a title. A name like “Predictions” seems fitting, so let’s set it like that:

    Assigning title for the time series forecast as “Predictions” and the figure size as 24.

    Analyzing the Results

    After adding the plt.show() command, we can run the cell to see our results:

    Running Python’s plt.show command to see the forecast results.

    We see a constantly decreasing line which isn’t at all realistic. In practice, we don’t expect prices to constantly decline. That would mean that the price today is as high as it is ever going to be, and the price tomorrow will always be lower. If that was the case, sellers would always want to get rid of their investments and buyers would have an incentive to hold off before they purchase. That way, everybody would be trying to sell, but nobody would be willing to buy.

    The issue here comes from our model of choice. Because we’re using a simple AR model, the predictions are only based on the constant and the prices from the previous period. Thus, we get into this constant pattern of creating a curve where every new value is just a fraction of the previous one, put on top of the constant term.

    We can see the curve better if we “zoom out” a little bit. To achieve that, just expand the prediction interval – to, say, November 23, 2019 – then plot the results one more time:

    Adjusting the time interval to a later date to see whether the forecasting curve will change.

    The new plot shows the curve much better than the previous one, so we can verify this unrealistic decline. Now you understand why the AR model is so bad at estimating non-stationary data.

    Even so, let’s have a look at how these time series predictions compare to the actual values over the same time period. Before we begin, make sure to set the “end” date back once again to January 1, 2015.

    We only need to add the testing set values to the graph. In fact, we only need to add the FTSE price between the “start” and “end” periods since the rest is not relevant right now:

    Adding the actual price values to forecasting code.

    We can also set the color of the new plot to “blue” to ensure it is distinctly different from the “red” we use for time series forecasting:

    Differentiating between the actual price values and the prediction values with the colors blue and red, respectively.

    After running the cell, we see a very interesting pattern:

    Running the cell in Python to reveal fluctuating actual values while the forecast remains in a descending straight line.

    Over the course of the interval, actual prices cyclically jump up and down around the value we’re expecting. So, does this mean our choice of model is a good estimator for FTSE prices in the long run?

    Not really. If we “zoom out” again, we’ll see how there is a trend where the values start to go up in a somewhat consistent fashion:

    Readjusting the time series interval to reveal a positive ascending trend in the actual price values.

    However, our prediction curve continues to decrease:

    The time series forecasting trend continues to descend in a straight line.

    This shows once again that AR models aren’t the best estimators of non-stationary data. For this specific case, we’d need a more complex model of time series forecasting in order to make better price predictions.

    Time Series Forecasting in Python: Next Steps

    While we no longer use crystal balls to predict the future, knowing what’s ahead of us is as important as ever. Using modern methods like time series forecasting is a great way to stay on top of industry trends and anticipate changes. We can not only predict what the weather would be like for the next harvest season, but also forecast the percentage of business revenue for the next quarter, stock investment trends, and more. This opens up a great expanse of career opportunities for those budding data scientists interested in analytics and future-proofing the world!

    Learn data science with industry experts

    Instructor at 365 Data Science

    Victor holds a double degree in Mathematics and Economics from Hamilton College and The London School of Economics and Political Science. His wide range of competencies along with his warm and friendly approach to teaching, have contributed to the success of a great number of students. Victor’s list of courses include: Data Preprocessing with NumPy, Probability, and Time Series Analysis with Python.

    What Is an ARMA Model?

    Time Series Analysis Tutorials

    What Is an Autoregressive Model?

    Time Series Analysis Tutorials

    What Is an Autoregressive Model?

    Article image

    Time Series Analysis: A Quick Introduction with Examples

    Article image

    Article image

    How To Pre-Process Time Series Data

    Article image

    What Is a Moving Average Model?

    Источник

    Читайте также:  Вывод значения переменной php html
Оцените статью