简体   繁体   English

尝试计算文本文件中的单词时出现问题

[英]Issues trying to count the words in a text file

I am trying to make this program work out but I am having some issues at the moment.我正在尝试使该程序正常运行,但目前我遇到了一些问题。 I want this code to read the total amount of words across this text file .我希望此代码能够读取此文本文件中的总字数。 This is what I have for now:这就是我现在所拥有的:

import os

openfile = input('Enter the input file: ')

accumulator = 0
accumulator2 = 0
accumulator3 = 0
accumulator4 = 0
word = 'PMID'
word2 = 'LA'
word3 = 'PT  - Journal'
try: 
    file = open(openfile, 'r')
    lines = file.readlines()

    with open(openfile, 'r') as f:
        lines = f.readlines()

    for line in lines:
        if word in line:
                accumulator += 1

        if word2 in line:
            accumulator2 += 1

        if word2 in line:
            accumulator3 += 1

        if 'Journal' in line and 'LA' in line:
                accumulator4 += 1

    print('there are:',accumulator ,'PMID')
    print('there are:',accumulator2 ,'LA')
    print('there are:',accumulator3 ,'PT')
    print('there are:',accumulator4 ,'PT and LA')
    exit()

except FileNotFoundError:
    print('Input file not found.')
    print('Please check the file name or the location of your input file.')

I want it to also count the text blocks that have together "LA - eng" "PT - Journal Article" as one (like the third block).我希望它还计算将“LA - eng”“PT - Journal Article”作为一个文本块(如第三个块)。 Is there a way to do this even though they are in different lines?即使它们在不同的行中,有没有办法做到这一点? Thank you so much!太感谢了!

This method I have from a course I took on Pluralsight some time ago, will output the number of instances of each word in the file.这种方法是我前段时间在 Pluralsight 上的一门课程中得到的,将 output 文件中每个单词的实例数。 This is useful when working with sentiment analysis algorithms:这在使用情绪分析算法时很有用:

    results = dict()
    with open(filename, 'r') as f:
        for line in f:
            for word in line.split():
                results[word] = results.setdefault(word, 0) + 1

    for word, count in sorted(results.items(), key=lambda x: x[1]):
        print('{} {}'.format(count, word))

You can run it like this in PyCharm Community version, having the file in the same directory as the project:您可以在 PyCharm 社区版本中像这样运行它,文件与项目位于同一目录中:

make sure the method is defined above the main check.确保该方法在主检查之上定义。

if __name__ == '__main__':
    count_words('word_file.txt')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM