Get unique values from a list in python [duplicate]
Does the order matter? I.e. do you want the order of first occurrence, or would [«PBS», «debate», «job», «thenandnow», «nowplaying»] work as well?
all the top solutions work for the example of the question, but they don’t answer the questions. They all use set , which is dependent on the types found in the list. e.g: d = dict();l = list();l.append (d);set(l) will lead to TypeError: unhashable type: ‘dict . frozenset instead won’t save you. Learn it the real pythonic way: implement a nested n^2 loop for a simple task of removing duplicates from a list. You can, then optimize it to n.log n. Or implement a real hashing for your objects. Or marshal your objects before creating a set for it.
If you need to preserve the order of the list: unique_items = list(dict.fromkeys(list_with_duplicates)) (CPython 3.6+)
30 Answers 30
First declare your list properly, separated by commas. You can get the unique values by converting the list to a set.
mylist = ['nowplaying', 'PBS', 'PBS', 'nowplaying', 'job', 'debate', 'thenandnow'] myset = set(mylist) print(myset)
If you use it further as a list, you should convert it back to a list by doing:
Another possibility, probably faster would be to use a set from the beginning, instead of a list. Then your code should be:
output = set() for x in trends: output.add(x) print(output)
As it has been pointed out, sets do not maintain the original order. If you need that, you should look for an ordered set implementation (see this question for more).
If you need to maintain the set order there is also a library on PyPI: pypi.python.org/pypi/ordered-set
«append» means to add to the end, which is accurate and makes sense for lists, but sets have no notion of ordering and hence no beginning or end, so «add» makes more sense for them.
the ‘sets’ module is deprecated, yes. So you don’t have to ‘import sets’ to get the functionality. if you see import sets; output = sets.Set() that’s deprecated This answer uses the built-in ‘set’ class docs.python.org/2/library/stdtypes.html#set
To be consistent with the type I would use:
@Ninjakannon your code will sort the list alphabetically. That does not have to be the order of the original list.
Note a neat way to do this in python 3 is mylist = [*] . This is an *arg -style set-expansion followed by an *arg -style list-expansion.*mylist>
@LukeDavis best answer for me, sorted([*]) is 25% faster than sorted(list(set(c))) (measured with timeit.repeat with number=100000)*c>
N.B.: This fails if the list has unhashable elements.(e.g. elements which are itself sets, lists or hashes).
If we need to keep the elements order, how about this:
used = set() mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow'] unique = [x for x in mylist if x not in used and (used.add(x) or True)]
And one more solution using reduce and without the temporary used var.
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow'] unique = reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])
UPDATE — Dec, 2020 — Maybe the best approach!
Starting from python 3.7, the standard dict preserves insertion order.
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.
So this gives us the ability to use dict.fromkeys() for de-duplication!
NOTE: Credits goes to @rlat for giving us this approach in the comments!
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow'] unique = list(dict.fromkeys(mylist))
In terms of speed — for me its fast enough and readable enough to become my new favorite approach!
UPDATE — March, 2019
And a 3rd solution, which is a neat one, but kind of slow since .index is O(n).
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow'] unique = [x for i, x in enumerate(mylist) if i == mylist.index(x)]
UPDATE — Oct, 2016
Another solution with reduce , but this time without .append which makes it more human readable and easier to understand.
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow'] unique = reduce(lambda l, x: l+[x] if x not in l else l, mylist, []) #which can also be writed as: unique = reduce(lambda l, x: l if x in l else l+[x], mylist, [])
NOTE: Have in mind that more human-readable we get, more unperformant the script is. Except only for the dict.fromkeys() approach which is python 3.7+ specific.
import timeit setup = "mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']" #10x to Michael for pointing out that we can get faster with set() timeit.timeit('[x for x in mylist if x not in used and (used.add(x) or True)]', setup='used = set();'+setup) 0.2029558869980974 timeit.timeit('[x for x in mylist if x not in used and (used.append(x) or True)]', setup='used = [];'+setup) 0.28999493700030143 # 10x to rlat for suggesting this approach! timeit.timeit('list(dict.fromkeys(mylist))', setup=setup) 0.31227896199925453 timeit.timeit('reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])', setup='from functools import reduce;'+setup) 0.7149233570016804 timeit.timeit('reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])', setup='from functools import reduce;'+setup) 0.7379565160008497 timeit.timeit('reduce(lambda l, x: l if x in l else l+[x], mylist, [])', setup='from functools import reduce;'+setup) 0.7400134069976048 timeit.timeit('[x for i, x in enumerate(mylist) if i == mylist.index(x)]', setup=setup) 0.9154880290006986
ANSWERING COMMENTS
Because @monica asked a good question about «how is this working?». For everyone having problems figuring it out. I will try to give a more deep explanation about how this works and what sorcery is happening here 😉
I try to understand why unique = [used.append(x) for x in mylist if x not in used] is not working.
Well it’s actually working
>>> used = [] >>> mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow'] >>> unique = [used.append(x) for x in mylist if x not in used] >>> print used [u'nowplaying', u'PBS', u'job', u'debate', u'thenandnow'] >>> print unique [None, None, None, None, None]
The problem is that we are just not getting the desired results inside the unique variable, but only inside the used variable. This is because during the list comprehension .append modifies the used variable and returns None .
So in order to get the results into the unique variable, and still use the same logic with .append(x) if x not in used , we need to move this .append call on the right side of the list comprehension and just return x on the left side.
But if we are too naive and just go with:
>>> unique = [x for x in mylist if x not in used and used.append(x)] >>> print unique []
We will get nothing in return.
Again, this is because the .append method returns None , and it this gives on our logical expression the following look:
This will basically always:
And in both cases ( False / None ), this will be treated as falsy value and we will get an empty list as a result.
But why this evaluates to None when x is not in used ? Someone may ask.
Well it’s because this is how Python’s short-circuit operators works.
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
So when x is not in used (i.e. when its True ) the next part or the expression will be evaluated ( used.append(x) ) and its value ( None ) will be returned.
But that’s what we want in order to get the unique elements from a list with duplicates, we want to .append them into a new list only when we they came across for a fist time.
So we really want to evaluate used.append(x) only when x is not in used , maybe if there is a way to turn this None value into a truthy one we will be fine, right?
Well, yes and here is where the 2nd type of short-circuit operators come to play.
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
We know that .append(x) will always be falsy , so if we just add one or next to him, we will always get the next part. That’s why we write:
x not in used and (used.append(x) or True)
so we can evaluate used.append(x) and get True as a result, only when the first part of the expression (x not in used) is True .
Similar fashion can be seen in the 2nd approach with the reduce method.
(l.append(x) or l) if x not in l else l #similar as the above, but maybe more readable #we return l unchanged when x is in l #we append x to l and return l when x is not in l l if x in l else (l.append(x) or l)
- Append x to l and return that l when x is not in l . Thanks to the or statement .append is evaluated and l is returned after that.
- Return l untouched when x is in l
How to make lists contain only distinct element in Python? [duplicate]
Please fix the title of your question. You’re not talking about make lists distinct. You’re talking about making list items distinct.
10 Answers 10
The simplest is to convert to a set then back to a list:
One disadvantage with this is that it won’t preserve the order. You may also want to consider if a set would be a better data structure to use in the first place, instead of a list.
@Ant Dictionary key order is preserved from Python 3.6, but it says «the order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon». Since they’re both based on hashes, I’d think set would be the same, but it’s not mentioned, so apparently not: docs.python.org/3.6/whatsnew/3.6.html
Preserve order and functional way: In [23]: from functools import reduce In [24]: reduce(lambda acc,elem: acc+[elem] if not elem in acc else acc , [2,1,2,3,3,3,4,5], []) Out[24]: [2, 1, 3, 4, 5]
def f(seq): # Order preserving ''' Modified version of Dave Kirby solution ''' seen = set() return [x for x in seq if x not in seen and not seen.add(x)]
OK, now how does it work, because it’s a little bit tricky here if x not in seen and not seen.add(x) :
In [1]: 0 not in [1,2,3] and not print('add') add Out[1]: True
Why does it return True? print (and set.add) returns nothing:
In [2]: 1 not in [1,2,3] and not print('add') Out[2]: False
Why does it print ‘add’ in [1] but not in [2]? See False and print(‘add’) , and doesn’t check the second argument, because it already knows the answer, and returns true only if both arguments are True.
More generic version, more readable, generator based, adds the ability to transform values with a function:
def f(seq, idfun=None): # Order preserving return list(_f(seq, idfun)) def _f(seq, idfun=None): ''' Originally proposed by Andrew Dalke ''' seen = set() if idfun is None: for x in seq: if x not in seen: seen.add(x) yield x else: for x in seq: x = idfun(x) if x not in seen: seen.add(x) yield x
def f(seq): # Not order preserving return list(set(seq))