Python list количество вхождений

How do I count the occurrences of a list item?

Given a single item, how do I count occurrences of it in a list, in Python? A related but different problem is counting occurrences of each different element in a collection, getting a dictionary or list as a histogram result instead of a single integer. For that problem, see Using a dictionary to count the items in a list.

As discussed on meta, this question is nominally about counting a single item. Over the years, it’s attracted multiple (good) answers about the related but significantly different problem of histogramming; counting every different element, which should use an O(n) algorithm, not .count for every element which is O(n^2). Ideally those should be in (a) different Q&A(s), but for now they’re here.

30 Answers 30

If you only want a single item’s count, use the count method:

Important: this is very slow if you are counting multiple different items

Each count call goes over the entire list of n elements. Calling count in a loop n times means n * n total checks, which can be catastrophic for performance.

If you want to count multiple items, use Counter , which only does n total checks.

Counting unique elements in my case yielded the following timings: 114.19seconds with list.count() , 0.53 seconds with numpy.unique(list, return_counts = True) and 0.17 seconds with Counter . The difference is striking.

Читайте также:  Всплывающее окно css пример

Use Counter if you are using Python 2.7 or 3.x and you want the number of occurrences for each element:

>>> from collections import Counter >>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red'] >>> Counter(z) Counter() 

I have found that when using this a lot (talking about millions of strings) that it is very slow because of its calls to isinstance . So if you are certain about the data that you’re working with, it might be better to write a custom function without type and instance checking.

@BramVanroy: What isinstance calls? Even with millions of strings, calling Counter only involves one isinstance call, to check whether its argument is a mapping. You most likely misjudged what’s eating all your time.

You misinterpreted what I meant: Counter checks the types of your data before it creates the Counter. This takes relatively much time and if you know the type of your data in advance. If you look at Counter’s update method, you’ll see it has to go through three if-statements before doing something. If you call update frequently, this adds up quickly. When you have control over your data and you know that the input will be indeed an iterable, then you can skip the first two checks. As I said, I only noticed this when working with millions of updates so it’s an edge case.

@BramVanroy: If you’re performing millions of updates rather than just counting millions of strings, that’s a different story. The optimization effort in Counter has gone into counting large iterables, rather than counting many iterables. Counting a million-string iterable will go faster with Counter than with a manual implementation. If you want to call update with many iterables, you may be able to speed things up by joining them into one iterable with itertools.chain .

In case you want to sort the results how-to-sort-counter-by-value-python —> x = Counter() x.most_common()

Counting the occurrences of one item in a list

For counting the occurrences of just one list item you can use count()

>>> l = ["a","b","b"] >>> l.count("a") 1 >>> l.count("b") 2 

Counting the occurrences of all items in a list is also known as «tallying» a list, or creating a tally counter.

Counting all items with count()

To count the occurrences of items in l one can simply use a list comprehension and the count() method

(or similarly with a dictionary dict((x,l.count(x)) for x in set(l)) )

>>> l = ["a","b","b"] >>> [[x,l.count(x)] for x in set(l)] [['a', 1], ['b', 2]] >>> dict((x,l.count(x)) for x in set(l))

Counting all items with Counter()

Alternatively, there’s the faster Counter class from the collections library

>>> l = ["a","b","b"] >>> from collections import Counter >>> Counter(l) Counter() 

How much faster is Counter?

I checked how much faster Counter is for tallying lists. I tried both methods out with a few values of n and it appears that Counter is faster by a constant factor of approximately 2.

Here is the script I used:

from __future__ import print_function import timeit t1=timeit.Timer('Counter(l)', \ 'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]' ) t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]', 'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]' ) print("Counter(): ", t1.repeat(repeat=3,number=10000)) print("count(): ", t2.repeat(repeat=3,number=10000) 
Counter(): [0.46062711701961234, 0.4022796869976446, 0.3974247490405105] count(): [7.779430688009597, 7.962715800967999, 8.420845870045014] 

Counter is way faster for bigger lists. The list comprehension method is O(n^2), Counter should be O(n).

I have found that when using this a lot (talking about millions of strings) that it is very slow because of its calls to isinstance . So if you are certain about the data that you’re working with, it might be better to write a custom function without type and instance checking.

Another way to get the number of occurrences of each item, in a dictionary:

this looks like one of the constructs I often come up with in the heat of the battle, but it will run through a len(a) times which means quadratic runtime complexity (as each run depends on len(a) again).

@hugo24: A bit, but it won’t be asymptotically faster in the worst case; it will take n * (number of different items) operations, not counting the time it takes to build the set. Using collections.Counter is really much better.

very late to the party but wouldn’t following code throw an error if a list contained more than one instance of i , because it will try to enter multiple keys of same value in a dictionary. dict((i, a.count(i)) for i in a)

@rp1 you can try it for yourself and see that later key-value pairs just overwrite the previous entry for the same key, for example dict([(1, 2), (1, 3)]) returns

Given an item, how can I count its occurrences in a list in Python?

>>> l = list('aaaaabbbbcccdde') >>> l ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e'] 

list.count

There’s the list.count method

This works fine for any list. Tuples have this method as well:

>>> t = tuple('aabbbffffff') >>> t ('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f') >>> t.count('f') 6 

collections.Counter

And then there’s collections.Counter. You can dump any iterable into a Counter, not just a list, and the Counter will retain a data structure of the counts of the elements.

>>> from collections import Counter >>> c = Counter(l) >>> c['b'] 4 

Counters are based on Python dictionaries, their keys are the elements, so the keys need to be hashable. They are basically like sets that allow redundant elements into them.

Further usage of collections.Counter

You can add or subtract with iterables from your counter:

>>> c.update(list('bbb')) >>> c['b'] 7 >>> c.subtract(list('bbb')) >>> c['b'] 4 

And you can do multi-set operations with the counter as well:

>>> c2 = Counter(list('aabbxyz')) >>> c - c2 # set difference Counter() >>> c + c2 # addition of all elements Counter() >>> c | c2 # set union Counter() >>> c & c2 # set intersection Counter() 

Silly answer, sum

There are good builtin answers, but this example is slightly instructive. Here we sum all the occurences where the character, c, is equal to ‘b’ :

Not great for this use-case, but if you need to have a count of iterables where the case is True it works perfectly fine to sum the boolean results, since True is equivalent to 1 .

Why not pandas?

Pandas is a common library, but it’s not in the standard library. Adding it as a requirement is non-trivial.

There are builtin solutions for this use-case in the list object itself as well as in the standard library.

If your project does not already require pandas, it would be foolish to make it a requirement just for this functionality.

While «why not Pandas» is appropriate, it should probably be accompanied by «when to use NumPy», i.e. for large numeric arrays. The deciding factor isn’t just project limitations, there are memory efficiencies with NumPy which become apparent with big data.

Thanks for mentioning Pandas/etc as a serious dependency. Some of these packages have negative side effects. So addition of these assets for trivial needs can cost a lot of time and $. Personally I have experienced Numpy and SciPi adding 30 min to our CI pipeline and it took days to get the package caching correctly. Great packages, but sometimes there is hidden expense. +1’d

I’ve compared all suggested solutions (and a few new ones) with perfplot (a small project of mine).

Counting one item

For large enough arrays, it turns out that

is slightly faster than the other solutions.

enter image description here

Counting all items

enter image description here

Code to reproduce the plots:

from collections import Counter from collections import defaultdict import numpy import operator import pandas import perfplot def counter(a): return Counter(a) def count(a): return dict((i, a.count(i)) for i in set(a)) def bincount(a): return numpy.bincount(a) def pandas_value_counts(a): return pandas.Series(a).value_counts() def occur_dict(a): d = <> for i in a: if i in d: d[i] = d[i]+1 else: d[i] = 1 return d def count_unsorted_list_items(items): counts = defaultdict(int) for item in items: counts[item] += 1 return dict(counts) def operator_countof(a): return dict((i, operator.countOf(a, i)) for i in set(a)) perfplot.show( setup=lambda n: list(numpy.random.randint(0, 100, n)), n_range=[2**k for k in range(20)], kernels=[ counter, count, bincount, pandas_value_counts, occur_dict, count_unsorted_list_items, operator_countof ], equality_check=None, logx=True, logy=True, ) 
from collections import Counter from collections import defaultdict import numpy import operator import pandas import perfplot def counter(a): return Counter(a) def count(a): return dict((i, a.count(i)) for i in set(a)) def bincount(a): return numpy.bincount(a) def pandas_value_counts(a): return pandas.Series(a).value_counts() def occur_dict(a): d = <> for i in a: if i in d: d[i] = d[i] + 1 else: d[i] = 1 return d def count_unsorted_list_items(items): counts = defaultdict(int) for item in items: counts[item] += 1 return dict(counts) def operator_countof(a): return dict((i, operator.countOf(a, i)) for i in set(a)) b = perfplot.bench( setup=lambda n: list(numpy.random.randint(0, 100, n)), n_range=[2 ** k for k in range(20)], kernels=[ counter, count, bincount, pandas_value_counts, occur_dict, count_unsorted_list_items, operator_countof, ], equality_check=None, ) b.save("out.png") b.show() 

Источник

Python: count repeated elements in the list [duplicate]

I am new to Python. I am trying to find a simple way of getting a count of the number of elements repeated in a list e.g.

5 Answers 5

You can do that using count :

my_dict = >>> print my_dict #or print(my_dict) in python-3.x
from collections import Counter a = dict(Counter(MyList)) >>> print a #or print(a) in python-3.x

This solution is super slow. Much faster using pandas: pd.DataFrame(MyList, columns=[«x»]).groupby(‘x’).size().to_dict()

if you use a ‘set’ for get unique values in the ‘for’ it will go faster: setList = list(set(Mylist)) my_dict =

>>> from collections import Counter >>> MyList = ["a", "b", "a", "c", "c", "a", "c"] >>> c = Counter(MyList) >>> c Counter() 

This works for Python 2.6.6

a = ["a", "b", "a"] result = dict((i, a.count(i)) for i in a) print result 
duplicateFrequencies = <> for i in set(yourList): duplicateFrequencies[i] = yourList.count(i) 
In [2]: MyList = ["a", "b", "a", "c", "c", "a", "c"] In [3]: count = <> In [4]: for i in MyList: . if not i in count: . count[i] = 1 . else: . count[i] +=1 . In [5]: count Out[5]:

Linked

Hot Network Questions

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.27.43548

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

Оцените статью