简体   繁体   中英

How to sort list by frequency and alphabetically?

def count_words(s, n):
    """Return the n most frequently occuring words in s."""

    # TODO: Count the number of occurences of each word in s

    words = s.split()

    counts = Counter(words)

    # TODO: Sort the occurences in descending order (alphabetically in case of ties)

    # TODO: Return the top n most frequent words.
    return counts.most_common(n)

print count_words("betty bought a bit of butter but the butter was bitter", 3)

The current output is:

[('butter', 2), ('a', 1), ('bitter', 1)]

But the required one is:

[('butter', 2), ('a', 1), ('betty', 1)]

Since for the same frequency, it has to be sorted by alphabetically. So how to sort list 'counts' by frequency alphabetically?

As indicated by the Python docs

most_common([n])

Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily :

So the order of the ones listed with a count of 1 are not guaranteed in any particular order because the underlying structure is a dict .

If you want your results alphabetically, you'll need to do some more processing.

from collections import Counter

c = Counter() #counter generating code

print sorted(c.most_common(), key=lambda i: (-i[1], i[0]))[:3]

This basically grabs all your results first via. .most_common() , then sorts them by 2nd parameter (the word frequency) in descending order, then 1st parameter (the word) in ascending order. Finally taking the slice of the first 3 elements for your result.

Edit: I realized that I wasn't sorting properly, and itemgetter is limited to only ascending order.

You can do this by specifying a key function

>>> L = [('butter', 2), ('a', 1), ('bitter', 1), ('betty', 1)]
>>> sorted(L, key=lambda x: (-x[1], x[0]))
[('butter', 2), ('a', 1), ('betty', 1), ('bitter', 1)]

Since Python's sort is stable, another way is to sort alphabetically first and then a reversed sort by count

>>> from operator import itemgetter
>>> sorted(sorted(L), key=itemgetter(1), reverse=True)
[('butter', 2), ('a', 1), ('betty', 1), ('bitter', 1)]

First count all the words using the concept of a bucket, defined by a dictionary where the keys are the words and the values are the number of occurrences.

>>> bucket = {}
>>> for word in words:
...     if word in bucket:
...         bucket[word] += 1
...     else:
...         bucket[word] = 1
...
>>> bucket
{'betty': 1, 'bought': 1, 'a': 1, 'bit': 1, 'of': 1, 'butter': 2, 'but': 1, 'the': 1, 'was': 1, 'bitter': 1}

You can use the sorted function with no arguments to sort by key name.

>>> sorted(bucket)
['a', 'betty', 'bit', 'bitter', 'bought', 'but', 'butter', 'of', 'the', 'was']

Then to sort by value, from highest to lowest:

>>> sorted(bucket.items(), key=lambda kv_pair: kv_pair[1], reverse=True)
[('butter', 2), ('betty', 1), ('bought', 1), ('a', 1), ('bit', 1), ('of', 1), ('but', 1), ('the', 1), ('was', 1), ('bitter', 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM