Python for comprehension filter

For Loop vs. List Comprehension

Many simple «for loops» in Python can be replaced with list comprehensions. You can often hear that list comprehension is «more Pythonic» (almost as if there was a scale for comparing how Pythonic something is 😉). In this article, I will compare their performance and discuss when a list comprehension is a good idea, and when it’s not.

Filter a list with a «for loop» #

Let’s use a simple scenario for a loop operation — we have a list of numbers, and we want to remove the odd ones. One important thing to keep in mind is that we can’t remove items from a list as we iterate over it. Instead, we have to create a new one containing only the even numbers:

# filter_list.py

MILLION_NUMBERS = list(range(1_000_000))

def for_loop():
output = []
for element in MILLION_NUMBERS:
if not element % 2:
output.append(element)
return output

if not element % 2 is equivalent to if element % 2 == 0 , but it’s slightly faster. I will write a separate article about comparing boolean values soon. Let’s measure the execution time of this function. I’m using Python 3.8 for benchmarks (you can read about the whole setup in the Introduction article):

$ python -m timeit -s "from filter_list import for_loop" "for_loop()"
5 loops, best of 5: 65.4 msec per loop

It takes 65 milliseconds to filter a list of one million elements. How fast will a list comprehension deal with the same task?

Читайте также:  Css выбрать элемент класса

Filter a list with list comprehension #

# filter_list.py

MILLION_NUMBERS = list(range(1_000_000))

def list_comprehension():
return [number for number in MILLION_NUMBERS if not number % 2]
$ python -m timeit -s "from filter_list import list_comprehension" "list_comprehension()"
5 loops, best of 5: 44.5 msec per loop

«For loop» is around 50% slower than a list comprehension (65.4/44.5≈1.47). And we just reduced five lines of code to one line! Cleaner and faster code? Great! Can we make it better?

Filter a list with the filter function #

Python has a built-in filter function for filtering collections of elements. This sounds like a perfect use case for our problem, so let’s see how fast it will be.

# filter_list.py

MILLION_NUMBERS = list(range(1_000_000))

def filter_function():
return filter(lambda x: not x % 2, MILLION_NUMBERS)
$ python -m timeit -s "from filter_list import filter_function" "filter_function()"
1000000 loops, best of 5: 284 nsec per loop

284 nanoseconds?! That’s suspiciously fast! It turns out that the filter function returns an iterator. It doesn’t immediately go over one million elements, but it will return the next value when we ask for it. To get all the results at once, we can convert this iterator to a list.

# filter_list.py

MILLION_NUMBERS = list(range(1_000_000))

def filter_return_list():
return list(filter(lambda x: not x % 2, MILLION_NUMBERS))
$ python -m timeit -s "from filter_list import filter_return_list" "filter_return_list()"
2 loops, best of 5: 104 msec per loop

Now, its performance is not so great anymore. It’s 133% slower than the list comprehension (104/44.5≈2.337) and 60% slower than the «for loop» (104/65.4≈1.590). While, in this case, it’s not the best solution, an iterator is an excellent alternative to a list comprehension when we don’t need to have all the results at once. If it turns out that we only need to get a few elements from the filtered list, an iterator will be a few orders of magnitude faster than other «non-lazy» solutions.

Читайте также:  To do list

We could use the filterfalse() function from the itertools library to simplify the filtering condition. filterfalse returns the opposite elements than filter . It picks those elements that evaluate to False. Unfortunately, it doesn’t make any difference when it comes to performance:

from itertools import filterfalse

def filterfalse_list():
return list(filterfalse(lambda x: x % 2, MILLION_NUMBERS))
$ python -m timeit -s "from filter_list import filterfalse_list" "filterfalse_list()"
2 loops, best of 5: 103 msec per loop

Why is list comprehension faster than a for loop? #

But why is the list comprehension faster than a for loop? When you use a for loop, on every iteration, you have to look up the variable holding the list and then call its append() function. This doesn’t happen in a list comprehension. Instead, there is a special bytecode instruction LIST_APPEND that will append the current value to the list you’re constructing.

More than one operation in the loop #

  • «fizzbuzz» if the number can be divided by 3 and 5
  • «fizz» if the number can be divided by 3
  • «buzz» if the number can be divided by 5
  • the number itself, if it can’t be divided by 3 or 5

Here is a simple solution:

# filter_list.py

def fizz_buzz():
output = []
for number in MILLION_NUMBERS:
if number % 3 == 0 and number % 5 == 0:
output.append('fizzbuzz')
elif number % 3 == 0:
output.append('fizz')
elif number % 5 == 0:
output.append('buzz')
else:
output.append(number)
return output

Here is the list comprehension equivalent of the fizz_buzz():

['fizzbuzz' if x % 3 == 0 and x % 5 == 0 else 'fizz' if x % 3 == 0 else 'buzz' if x % 5 == 0 else x for x in MILLION_NUMBERS]

It’s not easy to read — at least for me. It gets better if we split it into multiple lines:

[ 
"fizzbuzz" if x % 3 == 0 and x % 5 == 0
else "fizz" if x % 3 == 0
else "buzz" if x % 5 == 0
else x
for x in MILLION_NUMBERS
]

But if I see a list comprehension that spans multiple lines, I try to refactor it. We can extract the «if» statements into a separate function:

# filter_list.py

def transform(number):
if number % 3 == 0 and number % 5 == 0:
return 'fizzbuzz'
elif number % 3 == 0:
return 'fizz'
elif number % 5 == 0:
return 'buzz'
return number

def fizz_buzz2():
output = []
for number in MILLION_NUMBERS:
output.append(transform(number))
return output

Now it’s trivial to turn it into a list comprehension. And we get the additional benefit of a nice separation of logic into a function that does the «fizz buzz» check and a function that actually iterates over a list of numbers and applies the «fizz buzz» transformation.

Here is the improved list comprehension:

def fizz_buzz2_comprehension(): 
return [transform(number) for number in MILLION_NUMBERS]

Let’s compare all three versions:

$ python -m timeit -s "from filter_list import fizz_buzz" "fizz_buzz()"
2 loops, best of 5: 191 msec per loop

$ python -m timeit -s "from filter_list import fizz_buzz2" "fizz_buzz2()"
1 loop, best of 5: 285 msec per loop

$ python -m timeit -s "from filter_list import fizz_buzz2_comprehension" "fizz_buzz2_comprehension()"
1 loop, best of 5: 224 msec per loop

Extracting a separate function adds some overhead. List comprehension with a separate transform() function is around 17% slower than the initial «for loop»-based version (224/191≈1.173). But it’s much more readable, so I prefer it over the other solutions.

And, if you are curious, the one-line list comprehension mentioned before is the fastest solution:

def fizz_buzz_comprehension(): 
return [
"fizzbuzz" if x % 3 == 0 and x % 5 == 0
else "fizz" if x % 3 == 0
else "buzz" if x % 5 == 0
else x
for x in MILLION_NUMBERS
]
$ python -m timeit -s "from filter_list import fizz_buzz_comprehension" "fizz_buzz_comprehension()"
2 loops, best of 5: 147 msec per loop

Fastest, but also harder to read. If you run this code through a code formatter like black (which is a common practice in many projects), it will further obfuscate this function:

[ 
"fizzbuzz"
if x % 3 == 0 and x % 5 == 0
else "fizz"
if x % 3 == 0
else "buzz"
if x % 5 == 0
else x
for x in MILLION_NUMBERS
]

There is nothing wrong with black here — we are simply putting too much logic inside the list comprehension. If I had to say what the above code does, it would take me much longer to figure it out than if I had two separate functions. Saving a few hundred milliseconds of execution time and adding a few seconds of reading time doesn’t sound like a good trade-off 😉.

Clever one-liners can impress some recruiters during code interviews. But in real life, separating logic into different functions makes it much easier to read and document your code. And, statistically, we read more code than we write.

Conclusions #

List comprehensions are often not only more readable but also faster than using «for loops.» They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.

Even though list comprehensions are popular in Python, they have a specific use case: when you want to perform some operations on a list and return another list. And they have limitations — you can’t break out of a list comprehension or put comments inside. In many cases, «for loops» will be your only choice.

I only scratched the surface of how useful list comprehension (or any other type of «comprehension» in Python) can be. If you want to learn more, Trey Hunner has many excellent articles and talks on this subject (for example, this one for beginners).

Don’t miss new articles

No spam, unsubscribe with one click.

Источник

Оцените статью