简体   繁体   中英

Counting most common items in a list in Python

I am trying to show the n most common items of a list but getting the error: TypeError: unhashable type: 'list'

import collections

test = [[u'the\xa0official', u'MySQL'], [u'MySQL', u'repos'], [u'repos', u'for'], [u'for', u'Linux'], [u'Linux', u'a'], [u'a', u'little'], [u'little', u'over'], [u'over', u'a'], [u'a', u'year'], [u'year', u'ago,'], [u'ago,', u'the'], [u'the', u'offering'], [u'offering', u'has'], [u'has', u'grown'], [u'grown', u'steadily.\xa0Starting'], [u'steadily.\xa0Starting', u'off'], [u'off', u'with'], [u'with', u'support'], [u'support', u'for'], [u'for', u'the'], [u'the', u'Yum'], [u'Yum', u'based'], [u'based', u'family'], [u'family', u'of\xa0Red'], [u'of\xa0Red', u'Hat/Fedora/Oracle'], [u'Hat/Fedora/Oracle', u'Linux,'], [u'Linux,', u'we'], [u'we', u'added'], [u'added', u'Apt'], [u'Apt', u'repos'], [u'repos', u'for'], [u'for', u'Debian'], [u'Debian', u'and'], [u'and', u'Ubuntu'], [u'Ubuntu', u'in'], [u'in', u'late'], [u'late', u'spring,'], [u'spring,', u'and'], [u'and', u'throughout'], [u'throughout', u'all']]

print test[0]
print type(test)

print collections.Counter(test).most_common(3)
>>> print collections.Counter(map(tuple,test)).most_common(3)
[((u'repos', u'for'), 2), ((u'and', u'throughout'), 1), ((u'based', u'family'), 1)]

collections.Counter is based on a dictionary. As such your keys need to be hashable, and lists aren't hashable.

If you want to count individual strings then you can extract the elements from each list using a generator expression, as below:

c = collections.Counter(word for pair in test for word in pair)

If you want to count the pairs, for example as 2-grams, then you need to convert each inner list into a tuple (which is hashable) and then pass that, which again can be done using a generator expression

c2 = collections.Counter(tuple(pair) for pair in test)

You need to change the inner lists to tuple so they are hashable

>>> from collections import Counter
>>> c = Counter(tuple(i) for i in test)
>>> c.most_common(3)
[(('repos', 'for'), 2),
 (('Hat/Fedora/Oracle', 'Linux,'), 1),
 (('year', 'ago,'), 1)]

As the error say, list are not hashable. One other way to circumvent the problem could be to go via strings: join the list with a separator (space seems a good choice), then do the count and split again:

>>> [(i.split(' '),j) for i,j in collections.Counter(' '.join(i) for i in test).most_common(3)]
[([u'repos', u'for'], 2), ([u'grown', u'steadily.\xa0Starting'], 1), ([u'Linux', u'a'], 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM