Векторная авторегрессия var python

Содержание

Vector Auto Regression for Multivariate Time Series Forecasting
Introduction to VAR model with related theories, concepts and implementation steps
Introduction to Time Series Forecasting
An introduction to time series, and basic concepts and modelling techniques related to time series analysis and…
The intuition behind the VAR model
Feature selection
Perform Augmented Dickey-Fuller to test for Stationarity
Select the order of the model
Fit the model and Evaluation
Rolling forecast origin
Evaluation metrics
Implementation with Python
Conclusion
References

Vector Auto Regression for Multivariate Time Series Forecasting

In the previous article, I discussed the basic theories and concepts regarding time series analysis and forecasting. And also gave an intuition to some univariate time series forecasting models such as AR, MA, ARMA, ARIMA and SARIMA. You can go through it using the below link.

Introduction to Time Series Forecasting

An introduction to time series, and basic concepts and modelling techniques related to time series analysis and…

I just finished a research work on time series forecasting and in that, I have to look for some multivariate time series techniques. I thought it would be beneficial to document my findings on Vector Auto Regression for someone who is getting started with a multivariate time series forecasting. So in this article, you will understand what the VAR model is and the related theories and concepts needed to work with a VAR model.

The intuition behind the VAR model

Vector Auto Regression model is a Multivariate forecasting algorithm as titles say. It means it is used in scenarios where forecasting with two or more time-series influence each other. The term ‘Autoregressive’ stands because each time-series variable is modelled as a function of its past values and lags are used as predictors.

Читайте также: Java spring jpa query

An important aspect which VAR differs from ARMA, ARIMA and other models is that the VAR model is bidirectional. That means predictors influence the Y and Y value influence the predictors.

VAR model can be modelled as a system of equations, where each variable gets one equation that can be represented as vectors. Suppose we have a vector of time series data Yt, then a VAR model with k variables and p lags can be expressed mathematically,

Where Yt, B0 and are k × 1column vectors and B0, B1, B2, …, Bp are k × k matrices of coefficients. The simplest VAR model for three variables is with lag and can be expressed as below,

Main steps of building a VAR model

Feature selection
Perform ADFuller to test for Stationarity
Select the order of the model
Fit the model and Evaluation

Feature selection

The first step of the correlation analysis is to draw a heatmap of the correlation matrix. Correlation matrix displays the correlation coefficient which is the linear historical relationship between the variables of the data frame.

Perform Augmented Dickey-Fuller to test for Stationarity

Stationarity is a statistical property of a time series which are mean, variance and covariance do not change over time. In the ADF test, there is a null hypothesis which the time series is considered as non-stationary. So, if the p-value of the test is less than the significance level then it rejects the null hypothesis and considers that the time series is stationary.

If the time series is not stationary, it is needed to differentiate the time series when training the model and invert the predicted values to get the real forecast by the number of times differentiated.

Select the order of the model

To select the right order of the VAR model, iteratively fit increasing orders of VAR model and pick the order that gives a model with least AIC. Also can check other best fit comparison estimates of BIC, FPE and HQIC.

Fit the model and Evaluation

After selecting the most suitable order and the features for the VAR model, it can be trained using the training set. Then there is a special technique called Rolling forecast origin which is the best evaluation technique for time series forecasting.

Rolling forecast origin

For the evaluation of the model, can use Rolling forecast origin technique. Most of the time series algorithms that were developed for forecasting the time series are based on this technique. Rolling forecast origin method is important in the evaluation of time series forecasting because of the sequential dependency between the values. This approach uses training sets, each one containing one more observation than the previous one, one-month look-ahead view of the data. There are 3 main variations of rolling forecast origin method.

One-step forecasting without re-estimation
Multi-step forecasting without re-estimation
Multi-step forecasting with re-estimation

In one-step forecasting without re-estimation, the model estimates a single set of training data and then forecasts one-step on the remaining data sets. Multi-step forecasting is similar to one-step forecasting but forecasts multiple steps forward. Multi-step forecasting with re-estimation is an alternative approach where the model is trained at each iteration before each forecasting is performed. The fundamental way to do the rolling forecast origin is to rebuild the model when each time a new observation is added.

Evaluation metrics

In time series forecasting, to evaluate the models, a comprehensive evaluation criterion is essential to measure the performance of the model. Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE) are commonly used metrics to reliably evaluate the performance of the models.

RMSE is that it penalizes large errors and scales the scores in the same units as the forecast values. MSE is the squared form of RMSE and is commonly used as a regression loss function. Generally, MAE outperforms RMSE for measuring an average model accuracy. MAPE is the average of absolute percentage errors which is popular in the industry since it is scale-independent and easy to interpret.

Implementation with Python

In python, all the necessary libraries are available for work with VAR models. Seaborn which is a python data visualization library based on matplotlib can be used to draw a correlation heatmap. For the ADF test, there is the statmodels package provides a reliable implementation of the ADF test which returns the p-value, the value of the test statistic, number of lags considered for the test and the critical value cutoffs. The same Statmodels provide a library to implement the VAR model as well. So, you can implement and evaluate a VAR model very easily using python.

Conclusion

In this article, covered the intuition behind the VAR model and what are the main steps needed to implement and evaluate a VAR model. Hope you got some useful information from my article.

References

[2] S. Siami-Namini, N. Tavakoli and A. Siami Namin, “A Comparison of ARIMA and LSTM in Forecasting Time Series,” 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, 2018, pp. 1394–1401, doi: 10.1109/ICMLA.2018.00227 [3] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature,” Geosci. Model Dev., vol. 7, no. 3, pp. 1247–1250, Jun. 2014.

Источник