Найти среднеквадратичное отклонение python

Как рассчитать стандартное отклонение списка в Python

Вы можете использовать один из следующих трех методов для вычисления стандартного отклонения списка в Python:

Способ 1: использовать библиотеку NumPy

import numpy as np #calculate standard deviation of list np.std( my_list ) 

Способ 2: использовать библиотеку статистики

import statistics as stat #calculate standard deviation of list stat. stdev ( my_list ) 

Способ 3: использовать пользовательскую формулу

#calculate standard deviation of list st. stdev ( my_list ) 

В следующих примерах показано, как использовать каждый из этих методов на практике.

Метод 1: рассчитать стандартное отклонение с помощью библиотеки NumPy

В следующем коде показано, как рассчитать как стандартное отклонение выборки, так и стандартное отклонение совокупности списка с помощью NumPy:

import numpy as np #define list my_list = [3, 5, 5, 6, 7, 8, 13, 14, 14, 17, 18] #calculate sample standard deviation of list np.std( my_list, ddof= 1 ) 5.310367218940701 #calculate population standard deviation of list np.std( my_list ) 5.063236478416116 

Обратите внимание, что стандартное отклонение совокупности всегда будет меньше, чем стандартное отклонение выборки для данного набора данных.

Метод 2: расчет стандартного отклонения с использованием библиотеки статистики

В следующем коде показано, как рассчитать как стандартное отклонение выборки, так и стандартное отклонение генеральной совокупности для списка с помощью библиотеки статистики Python:

import statistics as stat #define list my_list = [3, 5, 5, 6, 7, 8, 13, 14, 14, 17, 18] #calculate sample standard deviation of list stat. stdev (my_list) 5.310367218940701 #calculate population standard deviation of list stat. pstdev (my_list) 5.063236478416116 

Метод 3: расчет стандартного отклонения с использованием пользовательской формулы

В следующем коде показано, как вычислить как стандартное отклонение выборки, так и стандартное отклонение совокупности списка без импорта каких-либо библиотек Python:

#define list my_list = [3, 5, 5, 6, 7, 8, 13, 14, 14, 17, 18] #calculate sample standard deviation of list (sum((x-(sum(my_list) / len(my_list))) \*\* 2 for x in my_list) / (len(my_list)-1)) \*\* 0.5 5.310367218940701 #calculate population standard deviation of list (sum((x-(sum(my_list) / len(my_list))) \*\* 2 for x in my_list) / len(my_list)) \*\* 0.5 5.063236478416116 

Обратите внимание, что все три метода рассчитали одни и те же значения для стандартного отклонения списка.

Источник

numpy.std#

Compute the standard deviation along the specified axis.

Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.

Parameters : a array_like

Calculate the standard deviation of these values.

axis None or int or tuple of ints, optional

Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.

If this is a tuple of ints, a standard deviation is performed over multiple axes, instead of a single axis or all the axes as before.

dtype dtype, optional

Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.

out ndarray, optional

Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.

ddof int, optional

Means Delta Degrees of Freedom. The divisor used in calculations is N — ddof , where N represents the number of elements. By default ddof is zero.

keepdims bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

If the default value is passed, then keepdims will not be passed through to the std method of sub-classes of ndarray , however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

where array_like of bool, optional

Elements to include in the standard deviation. See reduce for details.

If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array.

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(x)) , where x = abs(a — a.mean())**2 .

The average squared deviation is typically calculated as x.sum() / N , where N = len(x) . If, however, ddof is specified, the divisor N — ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1 , it will not be an unbiased estimate of the standard deviation per se.

Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.

For floating-point input, the std is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.

>>> a = np.array([[1, 2], [3, 4]]) >>> np.std(a) 1.1180339887498949 # may vary >>> np.std(a, axis=0) array([1., 1.]) >>> np.std(a, axis=1) array([0.5, 0.5]) 

In single precision, std() can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.std(a) 0.45000005 

Computing the standard deviation in float64 is more accurate:

>>> np.std(a, dtype=np.float64) 0.44999999925494177 # may vary 

Specifying a where argument:

>>> a = np.array([[14, 8, 11, 10], [7, 9, 10, 11], [10, 15, 5, 10]]) >>> np.std(a) 2.614064523559687 # may vary >>> np.std(a, where=[[True], [True], [False]]) 2.0 

Источник

Calculate Standard Deviation in Python

Standard deviation is an important metric that is used to measure the spread in the data. It has useful applications in describing the data, statistical testing, etc. There are a number of ways in which you can calculate the standard deviation of a list of values in Python which is covered in this tutorial with examples.

standard deviation in python

In this tutorial, we will look at –

📚 Discover Online Data Science Courses & Programs (Enroll for Free)

Introductory ⭐

Intermediate ⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

  • What is standard deviation?
  • Manually calculate standard deviation
  • How to calculate standard deviation of a list in Python?
  • Standard deviation of a numpy array
  • Standard deviation of a pandas series

What is standard deviation?

Standard deviation is a measure of spread in the data. This means that if the standard deviation is higher, the data is more spread out and if it’s lower, the data is more centered. It is calculated by taking the square root of the variance. The following is the formula of standard deviation.

Popluation standard deviation formula

Note that the above is the formula for the population standard deviation. For sample standard deviation, we use the sample mean in place of the population mean and (sample size – 1) in place of the population size.

Both variance and standard deviation are measures of spread but the standard deviation is more commonly used. This is because the standard deviation is in the same units as the data.

Manually calculate standard deviation

Before we proceed to the computing standard deviation in Python, let’s calculate it manually to get an idea of what’s happening. For example, let’s calculate the standard deviation of the list of values [7, 2, 4, 3, 9, 12, 10, 1].

To calculate the standard deviation, let’s first calculate the mean of the list of values.

mean calculation

The mean comes out to be six (μ = 6).

Now, to calculate the standard deviation, using the above formula, we sum the squares of the difference between the value and the mean and then divide this sum by n to get the variance.

variance calculation

The variance comes out to be 14.5

The standard deviation can then be calculated by taking the square root of the variance.

standard deviation calculation

How to calculate standard deviation in Python?

There are a number of ways to compute standard deviation in Python. You can write your own function to calculate the standard deviation or use off-the-shelf methods from numpy or pandas.

Let’s write a vanilla implementation of calculating std dev from scratch in Python without using any external libraries.

def get_std_dev(ls): n = len(ls) mean = sum(ls) / n var = sum((x - mean)**2 for x in ls) / n std_dev = var ** 0.5 return std_dev # create a list of data points ls = [7, 2, 4, 3, 9, 12, 10, 2] get_std_dev(ls)

Here, we created a function to return the standard deviation of a list of values. Notice that we used the Python built-in sum() function to compute the sum for mean and variance. This function computes the sum of the sequence passed.

The above method is not the only way to get the standard deviation of a list of values. You can store the values as a numpy array or a pandas series and then use the simple one-line implementations for calculating standard deviations from these libraries.

Standard deviation of a numpy array

You can store the list of values as a numpy array and then use the numpy ndarray std() function to directly calculate the standard deviation. Here’s an example –

import numpy as np # list of data points ls = [7, 2, 4, 3, 9, 12, 10, 2] # create numpy array of list values ar = np.array(ls) # get the standard deviation print(ar.std())

You can see that we get the same result as above.

Standard deviation of a pandas series

You can also store the list of values as pandas series and then compute its standard deviation using the pandas series std() function.

This method is very similar to the numpy array method. In fact, under the hood, a number of pandas methods are wrappers on numpy methods.

Let’s compute the standard deviation of the same list of values using pandas this time.

import pandas as pd # list of data points ls = [7, 2, 4, 3, 9, 12, 10, 2] # create pandas series of list values col = pd.Series(ls) # get the standard deviation print(col.std())

You can see that the result is higher compared to the previous two examples. This is because pandas calculates the sample standard deviation by default (normalizing by N – 1). To get the population standard deviation, pass ddof = 0 to the std() function.

# get the standard deviation print(col.std(ddof=0))

Now we get the same standard deviation as the above two examples.

Note that pandas is generally used for working with two-dimensional data and offers a range of methods to manipulate, aggregate, and analyze data. For example, you can calculate the standard deviation of each column in a pandas dataframe.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5

Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.

Author

Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects. View all posts

Data Science Parichay is an educational website offering easy-to-understand tutorials on topics in Data Science with the help of clear and fun examples.

Источник

Читайте также:  Php error level all
Оцените статью