繁体   English   中英

从文本文件计算python中的单词

[英]Counting words in python from the text file

需要打开文本文件,并找到另一个文件中给定名称的出现次数。 程序应写名称; 计数对,用分号分隔成.csv格式的文件

它应该看起来像:

简; 77

赫克托; 34

安娜; 39

...

试图使用“ Counter”,但它看起来像一个列表,所以我认为这是执行任务的错误方法

import re
import collections
from collections import Counter

wanted = re.findall('\w+', open('iliadcounts.csv').read().lower())
cnt = Counter()
words = re.findall('\w+', open('pg6130.txt').read().lower())
for word in words:
    if word in wanted:
        cnt[word] += 1
print (cnt)

但这绝对不是此任务的正确代码...

您可以一次将整个单词列表提供给Counter,它将为您计数。 然后,您可以通过迭代遍历仅打印wanted的单词:

import re
import collections
from collections import Counter

# create some demo data as I do not have your data at hand - uses your filenames
def create_demo_files():     
    with open('iliadcounts.csv',"w") as f:
        f.write("hug,crane,box")
    with open('pg6130.txt',"w") as f:
        f.write("hug,shoe,blues,crane,crane,box,box,box,wood")

create_demo_files()


# work with your files
with open('iliadcounts.csv') as f:
    wanted = re.findall('\w+', f.read().lower())
with open('pg6130.txt') as f:
    cnt = Counter( re.findall('\w+', f.read().lower()) )


# printed output for all words in wanted (all words are counted)
for word in wanted:
    print("{}; {}".format(word, cnt.get(word)))

    # would work as well: 
    # https://docs.python.org/3/library/string.html#string-formatting
    # print(f"{word}; {cnt.get(word)}") 

输出:

hug; 1
crane; 2
box; 3

或者您可以打印整个计数器:

 print(cnt) 

输出:

Counter({'box': 3, 'crane': 2, 'hug': 1, 'shoe': 1, 'blues': 1, 'wood': 1})

链接:

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM