Расчет ковариационной матрицы python

Содержание

Как создать ковариационную матрицу в Python
Как создать ковариационную матрицу в Python
numpy.cov#
numpy.cov#
Python | numpy.cov() function

Как создать ковариационную матрицу в Python

Ковариация – это мера того, как изменения одной переменной связаны с изменениями второй переменной. В частности, это мера степени линейной связи двух переменных.

Ковариационная матрица — это квадратная матрица, показывающая ковариацию между множеством различных переменных. Это может быть полезным способом понять, как различные переменные связаны в наборе данных.

В следующем примере показано, как создать ковариационную матрицу в Python.

Как создать ковариационную матрицу в Python

Выполните следующие шаги: o создать ковариационную матрицу в Python.

Шаг 1. Создайте набор данных.

Сначала мы создадим набор данных, содержащий результаты тестов 10 разных учащихся по трем предметам: математике, естественным наукам. и история.

import numpy as np math = [84, 82, 81, 89, 73, 94, 92, 70, 88, 95] science = [85, 82, 72, 77, 75, 89, 95, 84, 77, 94] history = [97, 94, 93, 95, 88, 82, 78, 84, 69, 78] data = np.array([math, science, history])

Шаг 2. Создайте ковариационную матрицу.

Далее мы создадим ковариационную матрицу для этого набора данных с помощью функции numpy cov() , указав, что bias = True , чтобы мы могли рассчитать ковариационную матрицу генеральной совокупности.

np.cov(data, bias=True) array([[ 64.96, 33.2 , -24.44], [ 33.2 , 56.4 , -24.1 ], [-24.44, -24.1 , 75.56]])

Шаг 3. Интерпретация ковариационной матрицы.

Значения по диагоналям матрицы — это просто дисперсии каждого субъекта. Например:

Дисперсия оценок по математике составляет 64,96
Дисперсия оценок по естественным наукам составляет 56,4
Дисперсия оценок истории составляет 75,56

Другие значения в матрице представляют собой ковариации между различными субъектами. Например:

Ковариация между оценками по математике и естественным наукам составляет 33,2
Ковариация между оценками по математике и истории составляет -24,44
Ковариация между оценками по естественным наукам и истории составляет -24,1

положительное число для ковариации указывает, что две переменные имеют тенденцию увеличиваться или уменьшаться в тандеме. Например, математика и естествознание имеют положительную ковариацию (33,2), что указывает на то, что учащиеся, получившие высокие баллы по математике, также, как правило, получают высокие баллы по естественным наукам. И наоборот, учащиеся с низкими баллами по математике также имеют низкие баллы по естественным наукам.

Отрицательное число ковариации указывает на то, что по мере увеличения одной переменной вторая переменная имеет тенденцию к уменьшению. Например, математика и история имеют отрицательную ковариацию (-24,44), что указывает на то, что учащиеся с высокими баллами по математике, как правило, имеют низкие баллы по истории. И наоборот, учащиеся с низкими баллами по математике, как правило, получают высокие баллы по истории.

Шаг 4. Визуализируйте ковариационная матрица (необязательно).

Вы можете визуализировать ковариационную матрицу, используя функцию heatmap() из пакета seaborn :

import seaborn as sns import matplotlib.pyplot as plt cov = np.cov(data, bias=True) labs = ['math', 'science', 'history'] sns.heatmap(cov, annot=True, fmt='g', xticklabels=labs, yticklabels=labs) plt.show()

Вы также можете изменить палитру, указав аргумент cmap:

sns.heatmap(cov, annot=True, fmt='g', xticklabels=labs, yticklabels=labs, cmap='YlGnBu') plt.show()

Дополнительные сведения о стиле этой тепловой карты см. в документация Seaborn.

Источник

numpy.cov#

Estimate a covariance matrix, given data and weights.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, \(X = [x_1, x_2, . x_N]^T\) , then the covariance matrix element \(C_\) is the covariance of \(x_i\) and \(x_j\) . The element \(C_\) is the variance of \(x_i\) .

See the notes for an outline of the algorithm.

Parameters : m array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar bool, optional

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias bool, optional

Default normalization (False) is by (N — 1) , where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N . These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof int, optional

If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None .

1-D array of integer frequency weights; the number of times each observation vector should be repeated.

1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

Data-type of the result. By default, the return data-type will have at least numpy.float64 precision.

The covariance matrix of the variables.

Normalized covariance matrix

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> m = np.arange(10, dtype=np.float64) >>> f = np.arange(10) * 2 >>> a = np.arange(10) ** 2. >>> ddof = 1 >>> w = f * a >>> v1 = np.sum(w) >>> v2 = np.sum(w * a) >>> m -= np.sum(m * w, axis=None, keepdims=True) / v1 >>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when a == 1 , the normalization factor v1 / (v1**2 — ddof * v2) goes over to 1 / (np.sum(f) — ddof) as it should.

Consider two variables, \(x_0\) and \(x_1\) , which correlate perfectly, but in opposite directions:

>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T >>> x array([[0, 1, 2], [2, 1, 0]])

Note how \(x_0\) increases while \(x_1\) decreases. The covariance matrix shows this clearly:

Note that element \(C_\) , which shows the correlation between \(x_0\) and \(x_1\) , is negative.

Further, note how x and y are combined:

>>> x = [-2.1, -1, 4.3] >>> y = [3, 1.1, 0.12] >>> X = np.stack((x, y), axis=0) >>> np.cov(X) array([[11.71 , -4.286 ], # may vary [-4.286 , 2.144133]]) >>> np.cov(x, y) array([[11.71 , -4.286 ], # may vary [-4.286 , 2.144133]]) >>> np.cov(x) array(11.71)

Источник

numpy.cov#

Estimate a covariance matrix, given data and weights.

See the notes for an outline of the algorithm.

Parameters : m array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar bool, optional

bias bool, optional

ddof int, optional

1-D array of integer frequency weights; the number of times each observation vector should be repeated.

Data-type of the result. By default, the return data-type will have at least numpy.float64 precision.

The covariance matrix of the variables.

Normalized covariance matrix

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> m = np.arange(10, dtype=np.float64) >>> f = np.arange(10) * 2 >>> a = np.arange(10) ** 2. >>> ddof = 1 >>> w = f * a >>> v1 = np.sum(w) >>> v2 = np.sum(w * a) >>> m -= np.sum(m * w, axis=None, keepdims=True) / v1 >>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when a == 1 , the normalization factor v1 / (v1**2 — ddof * v2) goes over to 1 / (np.sum(f) — ddof) as it should.

Consider two variables, \(x_0\) and \(x_1\) , which correlate perfectly, but in opposite directions:

>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T >>> x array([[0, 1, 2], [2, 1, 0]])

Note how \(x_0\) increases while \(x_1\) decreases. The covariance matrix shows this clearly:

Note that element \(C_\) , which shows the correlation between \(x_0\) and \(x_1\) , is negative.

Further, note how x and y are combined:

>>> x = [-2.1, -1, 4.3] >>> y = [3, 1.1, 0.12] >>> X = np.stack((x, y), axis=0) >>> np.cov(X) array([[11.71 , -4.286 ], # may vary [-4.286 , 2.144133]]) >>> np.cov(x, y) array([[11.71 , -4.286 ], # may vary [-4.286 , 2.144133]]) >>> np.cov(x) array(11.71)

Источник

Python | numpy.cov() function

Covariance provides the a measure of strength of correlation between two variable or more set of variables. The covariance matrix element C_ij is the covariance of xi and xj. The element Cii is the variance of xi.

If COV(xi, xj) = 0 then variables are uncorrelated
If COV(xi, xj) > 0 then variables positively correlated
If COV(xi, xj) >< 0 then variables negatively correlated

Syntax: numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
Parameters:
m : [array_like] A 1D or 2D variables. variables are columns
y : [array_like] It has the same form as that of m.
rowvar : [bool, optional] If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed:
bias : Default normalization is False. If bias is True it normalize the data points.
ddof : If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified.
fweights : fweight is 1-D array of integer frequency weights
aweights : aweight is 1-D array of observation vector weights.
Returns: It returns ndarray covariance matrix

Источник

Читайте также: Static var array php