- How to Calculate Median in Python (with Examples)
- What Is the Median Value in Maths
- Why and When Is Median Value Useful
- How to Calculate the Median Value in Python
- How to Implement Median Function in Python
- How to Use a Built-In Median Function in Python
- Conclusion
- Further Reading
- median() function in Python statistics module
How to Calculate Median in Python (with Examples)
For example, let’s calculate the median of a list of numbers:
import statistics numbers = [1, 2, 3, 4, 5, 6, 7] med = statistics.median(numbers) print(med)
The median value is a common way to measure the “centrality” of a dataset.
If you are looking for a quick answer, I’m sure the above example will do. But to really learn what median really is, why it is useful, and how to find it, read along.
This is a comprehensive guide to finding the median in Python.
What Is the Median Value in Maths
The Median is the middle value of a given dataset.
If you have a list of 3 numbers, the median is the second number as it is in the middle.
But in case you have a list of 4 values, there is no “middle value”. When calculating the median, of an even-sized dataset, the average of the two middle values is used.
Why and When Is Median Value Useful
When dealing with statistics, you usually want to have a single number that describes the nature of a dataset.
Think about your school grades for example. Instead of seeing the dozens of grades, you want to know the average (the mean).
Usually, measuring the “centrality” of a dataset means calculating the mean value. But if you have a skewed distribution, the mean value can be unintuitive.
Let’s say you drive to your nearby shopping mall 7 times. Usually, the drive takes around 10 minutes. But one day the traffic jam makes it last 2 hours.
Here is a list of driving times to the mall:
[9, 120, 10, 9, 10, 10, 10]
Now if you take the average of this list, you get ~25 minutes. But how well does this number really describe your trip?
As you can see, most of the time the trip takes around 10 minutes.
To better describe the driving time, you should use a median value instead. To calculate the median value, you need to sort the driving times first:
[9, 9, 10, 10, 10, 10, 120]
Then you can choose the middle value, which in this case is 10 minutes. 10 minutes describes your typical trip length way better than 25, right?
The usefulness of calculating the median, in this case, is that the unusually high value of 120 does not matter.
In short, you can calculate the median value when measuring centrality with average is unintuitive.
How to Calculate the Median Value in Python
In Python, you can either create a function that calculates the median or use existing functionality.
How to Implement Median Function in Python
If you want to implement the median function, you need to understand the procedure of finding the median.
The median function works such that it:
- Takes a dataset as input.
- Sorts the dataset.
- Checks if the dataset is odd/even in length.
- If the dataset is odd in length, the function picks the mid-value and returns it.
- If the dataset is even, the function picks the two mid values, calculates the average, and returns the result.
Here is how it looks in the code:
def median(data): sorted_data = sorted(data) data_len = len(sorted_data) middle = (data_len - 1) // 2 if middle % 2: return sorted_data[middle] else: return (sorted_data[middle] + sorted_data[middle + 1]) / 2.0
numbers = [1, 2, 3, 4, 5, 6, 7] med = median(numbers) print(med)
Now, this is a valid approach if you need to write the median function yourself. But with common maths operations, you should use a built-in function to save time and headaches.
Let’s next take a look at how to calculate the median with a built-in function in Python.
How to Use a Built-In Median Function in Python
In Python, there is a module called statistics. This module contains useful mathematical tools for data science and statistics.
One of the great methods of this module is the median() function.
As the name suggests, this function calculates the median of a given dataset.
To use the median function from the statistics module, remember to import it into your project.
Here is an example of calculating the median for a bunch of numbers:
import statistics numbers = [1, 2, 3, 4, 5, 6, 7] med = statistics.median(numbers) print(med)
Conclusion
Today you learned how to calculate the median value in Python.
To recap, the median value is a way to measure the centrality of a dataset. The Median is useful when the average doesn’t properly describe the dataset and gives falsy results.
To calculate the median in Python, use the built-in median() function from the statistics module.
import statistics numbers = [1, 2, 3, 4, 5, 6, 7] med = statistics.median(numbers)
Thanks for reading. Happy coding!
Further Reading
median() function in Python statistics module
Python is a very popular language when it comes to data analysis and statistics. Luckily, Python3 provide statistics module, which comes with very useful functions like mean(), median(), mode() etc.
median() function in the statistics module can be used to calculate median value from an unsorted data-list. The biggest advantage of using median() function is that the data-list does not need to be sorted before being sent as parameter to the median() function.
Median is the value that separates the higher half of a data sample or probability distribution from the lower half. For a dataset, it may be thought of as the middle value. The median is the measure of the central tendency of the properties of a data-set in statistics and probability theory. Median has a very big advantage over Mean, which is the median value is not skewed so much by extremely large or small values. The median value is either contained in the data-set of values provided or it doesn’t sway too much from the data provided.
For odd set of elements, the median value is the middle one.
For even set of elements, the median value is the mean of two middle elements.
Median can be represented by the following formula :
Syntax : median( [data-set] )
Parameters :
[data-set] : List or tuple or an iterable with a set of numeric values
Returns : Return the median (middle value) of the iterable containing the data
Exceptions : StatisticsError is raised when iterable passed is empty or when list is null.
Code #1 : Working