简体   繁体   English

文本文件的 Python 字数统计

[英]Python Word Count of Text File

I'm trying to get a count of the frequency of a word in a Text File using a python function.我正在尝试使用 python 函数计算文本文件中单词的频率。 I can get the frequency of all of the words separately, but I'm trying to get a count of specific words by having them in a list.我可以分别获得所有单词的频率,但我试图通过将它们放在列表中来计算特定单词的数量。 Here's what I have so far but I am currently stuck.这是我到目前为止所拥有的,但我目前陷入困境。 My我的

def repeatedWords():
    with open(fname) as f:
        wordcount={}
        for word in word_list:
            for word in f.read().split():
                if word not in wordcount:
                    wordcount[word] = 1
                else:
                    wordcount[word] += 1
            for k,v in wordcount.items():
                 print k, v

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
repeatedWords('file.txt')

Updated, still showing all words:更新,仍然显示所有单词:

def repeatedWords(fname, word_list):
with open(fname) as f:
    wordcount = {}
    for word in word_list:
        for word in f.read().split():
            wordcount[word] = wordcount.get(word, 0) + 1


for k,v in wordcount.items():
    print k, v

word_list = ['Emma', 'Woodhouse', 'father', 'Taylor', 'Miss', 'been', 'she', 'her'] repeatedWords('Emma.txt', word_list) word_list = ['Emma', 'Woodhouse', 'father', 'Taylor', 'Miss', 'been', 'she', 'her']重复词('Emma.txt', word_list)

So you want the frequency of only the specific words in that list (Emma, Woodhouse, Father...)?所以您只想要该列表中特定单词的频率(Emma、Woodhouse、Father...)? If so, this code might help (try running it):如果是这样,此代码可能会有所帮助(尝试运行它):

    word_list = ['Emma','Woodhouse','father','Taylor','Miss','been','she','her']
    #i'm using this example text in place of the file you are using
    text = 'This is an example text. It will contain words you are looking for, like Emma, Emma, Emma, Woodhouse, Woodhouse, Father, Father, Taylor,Miss,been,she,her,her,her. I made them repeat to show that the code works.'
    text = text.replace(',',' ') #these statements remove irrelevant punctuation
    text = text.replace('.','')
    text = text.lower() #this makes all the words lowercase, so that capitalization wont affect the frequency measurement

    for repeatedword in word_list:
        counter = 0 #counter starts at 0
        for word in text.split():
            if repeatedword.lower() == word:
                counter = counter + 1 #add 1 every time there is a match in the list
        print(repeatedword,':', counter) #prints the word from 'word_list' and its frequency

The output shows the frequency of only those words in the list you provided, and that's what you wanted right?输出仅显示您提供的列表中那些单词的频率,这就是您想要的,对吗?

the output produced when run in python3 is:在python3中运行时产生的输出是:

    Emma : 3
    Woodhouse : 2
    father : 2
    Taylor : 1
    Miss : 1
    been : 1
    she : 1
    her : 3

The best way to deal with this is to use get method in Python dictionary.解决这个问题的最好方法是在 Python 字典中使用get方法。 It can be like this:它可以是这样的:

def repeatedWords():
with open(fname) as f:
    wordcount = {}
    #Example list of words not needed
    nonwordlist = ['father', 'Miss', 'been']
    for word in word_list:
        for word in file.read().split():
            if not word in nonwordlist:
                wordcount[word] = wordcount.get(word, 0) + 1


# Put these outside the function repeatedWords
for k,v in wordcount.items():
    print k, v

The print statement should give you this:打印语句应该给你这个:

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
newDict = {}
for newWord in word_list:
    newDict[newWord] = newDict.get(newWord, 0) + 1

print newDict

What this line wordcount[word] = wordcount.get(word, 0) + 1 does is, it first looks for word in the dictionary wordcount , if the word already exists, it gets it's value first and adds 1 to it.这行wordcount[word] = wordcount.get(word, 0) + 1作用是,它首先在字典wordcount查找word ,如果该单词已经存在,则首先获取它的值并将其加1 If the word does not exist, the value defaults to 0 and at this instance, 1 is added making it the first occurrence of that word having a count of 1 .如果该word不存在,则该值默认为0 ,在这种情况下,添加1使其成为该单词的第一次出现,其计数为1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM