简体   繁体   English

你如何计算 Python 列表中的出现次数?

[英]How do you count occurrences in a list in Python?

I'm new to python and I want to Count the number of times each word occurs across all the files.我是 python 的新手,我想计算每个单词在所有文件中出现的次数。 Display each word, the number of times it occurred and the percentage of time it occurred.显示每个单词、它出现的次数和它出现的时间百分比。 Sort the list so the most frequent word appears first, and the least frequent word appears last.对列表进行排序,使最常出现的词首先出现,最不常出现的词出现在最后。 I'm working on small sample right know just one file but I can't get to work right,我正在处理小样本,只知道一个文件,但我无法正常工作,

 from collections import defaultdict

words = "apple banana apple strawberry banana lemon"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

As recommended above, the Counter class from the collections module is definitely the way to go for counting applications.正如上面所推荐的,来自collections模块的Counter类绝对是计数应用程序的方法。

This solution also addresses the request to count words in multiple files using the fileinput.input() method to iterate over the contents of all the filenames specified on the command line (or if no filenames specified on the command line then will read from STDIN , typically the keyboard)该解决方案还解决计数中使用的多个文件字的请求fileinput.input()所有命令行上指定文件名的内容的方法来迭代(或者如果没有在命令行上指定的文件名,然后将读取STDIN ,通常是键盘)

Finally it uses a little more sophisticated approach for breaking the line into 'words' with a regular expression as a delimiter.最后,它使用更复杂的方法将行分成“单词”,并使用正则表达式作为分隔符。 As noted in the code it will handle contractions more gracefully (however it will be confused by apostrophes being used a single quotes)如代码中所述,它将更优雅地处理收缩(但是它会被使用单引号的撇号混淆)

"""countwords.py
   count all words across all files
"""

import fileinput
import re
import collections

# create a regex delimiter that is any character that is  not 1 or
# more word character or an apostrophe, this allows contractions
# to be treated as a word (eg can't  won't  didn't )
# Caution: this WILL get confused by a line that uses apostrophe
# as a single quote: eg 'hello' would be treated as a 7 letter word

word_delimiter = re.compile(r"[^\w']+")

# create an empty Counter

counter = collections.Counter()

# use fileinput.input() to open and read ALL lines from ALL files
# specified on the command line, or if no files specified on the
# command line then read from STDIN (ie the keyboard or redirect)

for line in fileinput.input():
    for word in word_delimiter.split(line):
        counter[word.lower()] += 1   # count case insensitively

del counter['']   # handle corner case of the occasional 'empty' word

# compute the total number of words using .values() to get an
# generator of all the Counter values (ie the individual word counts)        
# then pass that generator to the sum function which is able to 
# work with a list or a generator

total = sum(counter.values())

# iterate through the key/value pairs (ie word/word_count) in sorted
# order - the lambda function says sort based on position 1 of each
# word/word_count tuple (ie the word_count) and reverse=True does
# exactly what it says = reverse the normal order so it now goes
# from highest word_count to lowest word_count

print("{:>10s}  {:>8s} {:s}".format("occurs", "percent", "word"))

for word, count in sorted(counter.items(),
                          key=lambda t: t[1],
                          reverse=True):
    print ("{:10d} {:8.2f}% {:s}".format(count, count/total*100, word))

Example output:示例输出:

$ python3 countwords.py
I have a dog, he is a good dog, but he can't fly
^D

occurs   percent word
     2    15.38% a
     2    15.38% dog
     2    15.38% he
     1     7.69% i
     1     7.69% have
     1     7.69% is
     1     7.69% good
     1     7.69% but
     1     7.69% can't
     1     7.69% fly

And:和:

$ python3 countwords.py text1 text2
    occurs   percent word
         2    11.11% hello
         2    11.11% i
         1     5.56% there
         1     5.56% how
         1     5.56% are
         1     5.56% you
         1     5.56% am
         1     5.56% fine
         1     5.56% mark
         1     5.56% where
         1     5.56% is
         1     5.56% the
         1     5.56% dog
         1     5.56% haven't
         1     5.56% seen
         1     5.56% him

Using your code, here's a neater approach:使用您的代码,这里有一个更简洁的方法:

# Initializing Dictionary
d = {}
with open(sys.argv[1], 'r') as f:

    # counting number of times each word comes up in list of words (in dictionary)
    for line in f: 
        words = line.lower().split() 
        # Iterate over each word in line 
        for word in words: 
            if word not in d.keys():
                d[word] = 1
            else:
                d[word]+=1

n_all_words = sum([k.values])

# Print percentage occurance
for k, v in d.items():
    print(f'{k} occurs {v} times and is {(100*v/n_all_words):,.2f}% total of words.')


# Sort a dictionary using this useful solution
# https://stackoverflow.com/a/613218/10521959
import operator
sorted_d = sorted(d.items(), key=operator.itemgetter(1))

As mentioned in the comments, this is precisely collections.Counter正如评论中提到的,这正是collections.Counter

words = 'a b c a'.split()
print(Counter(words).most_common())

From docs: https://docs.python.org/2/library/collections.html来自文档: https : //docs.python.org/2/library/collections.html

most_common([n])
Return a list of the n most common elements and their counts
from the most common to the least. If n is omitted or None,
most_common() returns all elements in the counter.
Elements with equal counts are ordered arbitrarily:

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

the most straightforward way to do this is just using the Counter function:最直接的方法就是使用 Counter 函数:

from collections import Counter
c = Counter(words.split())

output:输出:

Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})

to just get the words in order, or the counts:只需按顺序排列单词或计数:

list(c.keys())
list(c.values())

or put it into a normal dict:或者把它放到一个普通的字典中:

dict(c.items())

or list of tuples:或元组列表:

c.most_common()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM