简体   繁体   English

使用count方法对文本文件中的某个单词进行计数

[英]using count method to count a certain word in text file

I'm trying to count the number of times the word 'the' appears in two books saved as text files. 我正在尝试计算单词“ the”在另存为文本文件的两本书中出现的次数。 The code I'm running returns zero for each book. 我正在运行的代码为每本书返回零。

Here's my code: 这是我的代码:

def word_count(filename):
    """Count specified words in a text"""
    try:
        with open(filename) as f_obj:
            contents = f_obj.readlines()
            for line in contents:
                word_count = line.lower().count('the')
            print (word_count)

    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
    print (msg)

dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash   Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'

word_count(dracula)
word_count(siddhartha)

WHat am I doing wrong here? 我在这里做错了什么?

You are re-assigning word_count for each iteration. 您将为每次迭代重新分配word_count That means that at the end it will be the same as the number of occurrences of the in the last line of the file. 这意味着,在最后这将是相同的出现次数the在文件的最后一行。 You should be getting the sum. 你应该得到的总和。 Another thing: should there match? 另一件事:应该there匹配吗? Probably not. 可能不是。 You probably want to use line.split() . 您可能要使用line.split() Also, you can iterate through a file object directly; 同样,您可以直接遍历文件对象。 no need for .readlines() . 不需要.readlines() One last, use a generator expression to simplify. 最后,使用生成器表达式进行简化。 My first example is without the generator expression; 我的第一个示例没有生成器表达式; the second is with it: 第二个是:

def word_count(filename):
    with open(filename) as f_obj:
        total = 0
        for line in f_obj:
            total += line.lower().split().count('the')
        print(total)
def word_count(filename):
    with open(filename) as f_obj:
        total = sum(line.lower().split().count('the') for line in f_obj)
        print(total)

Unless the word 'the' appears on the last line of each file, you'll see zeros. 除非单词“ the”出现在每个文件的最后一行,否则您将看到零。

You likely want to initialize the word_count variable to zero then use augmented addition ( += ): 您可能希望将word_count变量初始化为零,然后使用增强加法( += ):

For example: 例如:

def word_count(filename):
    """Count specified words in a text"""
    try:
        word_count = 0                                       # <- change #1 here
        with open(filename) as f_obj:
            contents = f_obj.readlines()
            for line in contents:
                word_count += line.lower().count('the')      # <- change #2 here
            print(word_count)

    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
    print(msg)

dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash   Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'

word_count(dracula)
word_count(siddhartha)

Augmented addition isn't necessary, just helpful. 增强添加不是必需的,只是有帮助。 This line: 这行:

word_count += line.lower().count('the')

could be written as 可以写成

word_count = word_count + line.lower().count('the')

But you also don't need to read the lines all into memory at once. 但是,您也不需要一次将所有行读入内存。 You can iterate over the lines right from the file object. 您可以直接从文件对象遍历各行。 For example: 例如:

def word_count(filename):
    """Count specified words in a text"""
    try:
        word_count = 0
        with open(filename) as f_obj:
            for line in f_obj:                     # <- change here
                word_count += line.lower().count('the')
        print(word_count)

    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
        print(msg)

dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'

word_count(dracula)
word_count(siddhartha)

Another way: 其他方式:

with open(filename) as f_obj:
    contents = f_obj.read()
    print("The word 'the' appears " + str(contents.lower().count('the')) + " times")
import os
def word_count(filename):
    """Count specified words in a text"""
    if os.path.exists(filename):
        if not os.path.isdir(filename):
            with open(filename) as f_obj:
                print(f_obj.read().lower().count('t'))
        else:
            print("is path to folder, not to file '%s'" % filename)
    else:
        print("path not found '%s'" % filename)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM