简体   繁体   中英

Python list of frequent occurrences in a list of strings

I'm writing a python function that consumes a list of strings and produces a list of the most frequently occurring items.

For example:

>>> trending(["banana", "trouble", "StarWars", "StarWars", "banana", "chicken", "BANANA"])
["banana", "StarWars"]

but

>>> trending(["banana", "trouble", "StarWars", "Starwars", "banana", "chicken"])
["banana"]

So far, I've written a function that produces only the first word that appears frequently instead of a list of words that appear frequently. Also, my list contains the index of that one frequent item.

def trending(slst):
    words = {}
    for word in slst:
        if word not in words:
            words[word] = 0
        words[word] += 1
    return words

How can I fix this function to produce a list of the most frequently occurring items (instead of the first of the most frequently occurring items) and how do I remove the index?

Without the use of Counter you can make your own counter with a dict and extract frequent items:

def trending(slst):
    count = {}
    items = []

    for item in set(slst):
        count[item] = slst.count(item)

    for k, v in count.items():
        if v == max(count.values()):
            items.append(k)

    return items

Use a Counter :

In [1]: from collections import Counter

In [2]: l = ["banana", "trouble", "StarWars", "StarWars", "banana", "chicken", "BANANA"]

In [3]: Counter(l)
Out[3]: Counter({'StarWars': 2, 'banana': 2, 'BANANA': 1, 'trouble': 1, 'chicken': 1})

With Counter(l).most_common(n) you can get the n most common items.


Update

Your trending() function is basically what the Counter does as well. After counting the word occurrences, you can get the maximum number of occurrences using max(words.values()) . This can be used for filtering your word list:

def trending(slst):
    ...
    max_occ = max(words.values())
    return [word for word, occ in words.items() if occ == max_occ]

The following solution uses only lists. No dictionary , set or other Python collection is used:

def trending(words):
    lcounts = [(words.count(word), word) for word in words]
    lcounts.sort(reverse=True)
    ltrending = []

    for count, word in lcounts:
        if count == lcounts[0][0]:
            if word not in ltrending:
                ltrending.append(word)
        else:
            break

    return ltrending


ltests = [
    ["banana", "trouble", "StarWars", "StarWars", "banana", "chicken", "BANANA"],
    ["banana", "trouble", "StarWars", "Starwars", "banana", "chicken"]]

for test in ltests:
    print trending(test)

It gives the following output:

['banana', 'StarWars']
['banana']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM