Linear Regression¶
Linear models with independently and identically distributed errors, and for errors with heteroscedasticity or autocorrelation. This module allows estimation by ordinary least squares (OLS), weighted least squares (WLS), generalized least squares (GLS), and feasible generalized least squares with autocorrelated AR(p) errors.
See Module Reference for commands and arguments.
Examples¶
# Load modules and data In [1]: import numpy as np In [2]: import statsmodels.api as sm In [3]: spector_data = sm.datasets.spector.load() In [4]: spector_data.exog = sm.add_constant(spector_data.exog, prepend=False) # Fit and summarize OLS model In [5]: mod = sm.OLS(spector_data.endog, spector_data.exog) In [6]: res = mod.fit() In [7]: print(res.summary()) OLS Regression Results ============================================================================== Dep. Variable: GRADE R-squared: 0.416 Model: OLS Adj. R-squared: 0.353 Method: Least Squares F-statistic: 6.646 Date: Fri, 05 May 2023 Prob (F-statistic): 0.00157 Time: 13:59:54 Log-Likelihood: -12.978 No. Observations: 32 AIC: 33.96 Df Residuals: 28 BIC: 39.82 Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ GPA 0.4639 0.162 2.864 0.008 0.132 0.796 TUCE 0.0105 0.019 0.539 0.594 -0.029 0.050 PSI 0.3786 0.139 2.720 0.011 0.093 0.664 const -1.4980 0.524 -2.859 0.008 -2.571 -0.425 ============================================================================== Omnibus: 0.176 Durbin-Watson: 2.346 Prob(Omnibus): 0.916 Jarque-Bera (JB): 0.167 Skew: 0.141 Prob(JB): 0.920 Kurtosis: 2.786 Cond. No. 176. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Detailed examples can be found here:
Technical Documentation¶
The statistical model is assumed to be
\(Y = X\beta + \mu\) , where \(\mu\sim N\left(0,\Sigma\right).\)
Depending on the properties of \(\Sigma\) , we have currently four classes available:
- GLS : generalized least squares for arbitrary covariance \(\Sigma\)
- OLS : ordinary least squares for i.i.d. errors \(\Sigma=\textbf\)
- WLS : weighted least squares for heteroskedastic errors \(\text\left (\Sigma\right)\)
- GLSAR : feasible generalized least squares with autocorrelated AR(p) errors \(\Sigma=\Sigma\left(\rho\right)\)
All regression models define the same methods and follow the same structure, and can be used in a similar fashion. Some of them contain additional model specific methods and attributes.
GLS is the superclass of the other regression classes except for RecursiveLS, RollingWLS and RollingOLS.
References¶
General reference for regression models:
Econometrics references for regression models:
- R.Davidson and J.G. MacKinnon. “Econometric Theory and Methods,” Oxford, 2004.
- W.Green. “Econometric Analysis,” 5th ed., Pearson, 2003.
Attributes¶
The following is more verbose description of the attributes which is mostly common to all regression classes
The p x n Moore-Penrose pseudoinverse of the whitened design matrix. It is approximately equal to \(\left(X^\Sigma^X\right)^X^\Psi\) , where \(\Psi\) is defined such that \(\Psi\Psi^=\Sigma^\) .
The n x n upper triangular matrix \(\Psi^\) that satisfies \(\Psi\Psi^=\Sigma^\) .
The model degrees of freedom. This is equal to p — 1, where p is the number of regressors. Note that the intercept is not counted as using a degree of freedom here.
The residual degrees of freedom. This is equal n — p where n is the number of observations and p is the number of parameters. Note that the intercept is counted as using a degree of freedom here.
The value of the likelihood function of the fitted model.
The number of observations n
A p x p array equal to \((X^\Sigma^X)^\) .
The n x n covariance matrix of the error terms: \(\mu\sim N\left(0,\Sigma\right)\) .
The whitened design matrix \(\Psi^X\) .
The whitened response variable \(\Psi^Y\) .
Module Reference¶
Model Classes¶
OLS (endog[, exog, missing, hasconst])
GLS (endog, exog[, sigma, missing, hasconst])
Generalized Least Squares
WLS (endog, exog[, weights, missing, hasconst])
GLSAR (endog[, exog, rho, missing, hasconst])
Generalized Least Squares with AR covariance structure
yule_walker (x[, order, method, df, inv, demean])
Estimate AR(p) parameters from a sequence using the Yule-Walker equations.
Compute Burg’s AP(p) parameter estimator.
RollingWLS (endog, exog[, window, weights, . ])
Rolling Weighted Least Squares
RollingOLS (endog, exog[, window, min_nobs, . ])
Rolling Ordinary Least Squares
An implementation of ProcessCovariance using the Gaussian kernel.
ProcessMLE (endog, exog, exog_scale, . [, cov])
Fit a Gaussian mean/variance regression model.
Sliced Inverse Regression (SIR)
Principal Hessian Directions (PHD)
Sliced Average Variance Estimation (SAVE)
Results Classes¶
Fitting a linear regression model returns a results class. OLS has a specific results class with some additional methods compared to the results class of the other linear models.
This class summarizes the fit of a linear regression model.
Results class for for an OLS model.
Results class for predictions.
Results for models estimated using regularization
Results instance for the QuantReg model
RecursiveLSResults (model, params, filter_results)
Class to hold results from fitting a recursive least squares model.
Results from rolling regressions
Results class for Gaussian process regression models.
Results class for a dimension reduction regression.
statsmodels.regression.linear_model.OLS¶
A 1-d endogenous response variable. The dependent variable.
A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant .
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
hasconst None or bool
Indicates whether the RHS includes a user-supplied constant. If True, a constant is not checked for and k_constant is set to 1 and all result statistics are calculated as if a constant is present. If False, a constant is not checked for and k_constant is set to 0.
Extra arguments that are used to set model properties when using the formula interface.
Fit a linear model using Weighted Least Squares.
Fit a linear model using Generalized Least Squares.
No constant is added by the model unless you are using formulas.
>>> import statsmodels.api as sm >>> import numpy as np >>> duncan_prestige = sm.datasets.get_rdataset("Duncan", "carData") >>> Y = duncan_prestige.data['income'] >>> X = duncan_prestige.data['education'] >>> X = sm.add_constant(X) >>> model = sm.OLS(Y,X) >>> results = model.fit() >>> results.params const 10.603498 education 0.594859 dtype: float64
>>> results.tvalues const 2.039813 education 6.892802 dtype: float64
>>> print(results.t_test([1, 0])) Test for Constraints ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ c0 10.6035 5.198 2.040 0.048 0.120 21.087 ==============================================================================
>>> print(results.f_test(np.identity(2))) df_denom=43, df_num=2>
Has an attribute weights = array(1.0) due to inheritance from WLS.
fit ([method, cov_type, cov_kwds, use_t])
Return a regularized fit to a linear regression model.
from_formula (formula, data[, subset, drop_cols])
Create a Model from a formula and dataframe.
Construct a random number generator for the predictive distribution.
Evaluate the Hessian function at a given point.
Calculate the weights for the Hessian.
Fisher information matrix of model.
Initialize model components.
The likelihood function for the OLS model.
Return linear predicted values from a design matrix.
Evaluate the score function at a given point.
OLS model whitener does nothing.
The model degree of freedom.
The residual degree of freedom.
Names of endogenous variables.
Names of exogenous variables.
statsmodels.regression.linear_model.OLS¶
A 1-d endogenous response variable. The dependent variable.
A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant .
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
hasconst None or bool
Indicates whether the RHS includes a user-supplied constant. If True, a constant is not checked for and k_constant is set to 1 and all result statistics are calculated as if a constant is present. If False, a constant is not checked for and k_constant is set to 0.
Extra arguments that are used to set model properties when using the formula interface.
Fit a linear model using Weighted Least Squares.
Fit a linear model using Generalized Least Squares.
No constant is added by the model unless you are using formulas.
>>> import statsmodels.api as sm >>> import numpy as np >>> duncan_prestige = sm.datasets.get_rdataset("Duncan", "carData") >>> Y = duncan_prestige.data['income'] >>> X = duncan_prestige.data['education'] >>> X = sm.add_constant(X) >>> model = sm.OLS(Y,X) >>> results = model.fit() >>> results.params const 10.603498 education 0.594859 dtype: float64
>>> results.tvalues const 2.039813 education 6.892802 dtype: float64
>>> print(results.t_test([1, 0])) Test for Constraints ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ c0 10.6035 5.198 2.040 0.048 0.120 21.087 ==============================================================================
>>> print(results.f_test(np.identity(2))) df_denom=43, df_num=2>
Has an attribute weights = array(1.0) due to inheritance from WLS.
fit ([method, cov_type, cov_kwds, use_t])
Return a regularized fit to a linear regression model.
from_formula (formula, data[, subset, drop_cols])
Create a Model from a formula and dataframe.
Construct a random number generator for the predictive distribution.
Evaluate the Hessian function at a given point.
Calculate the weights for the Hessian.
Fisher information matrix of model.
Initialize model components.
The likelihood function for the OLS model.
Return linear predicted values from a design matrix.
Evaluate the score function at a given point.
OLS model whitener does nothing.
The model degree of freedom.
The residual degree of freedom.
Names of endogenous variables.
Names of exogenous variables.