简体   繁体   English

Python - 计算文本文件中的单词

[英]Python - Counting Words In A Text File

I'm new to Python and am working on a program that will count the instances of words in a simple text file. 我是Python的新手,正在开发一个程序,它将计算简单文本文件中的单词实例。 The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. 程序和文本文件将从命令行中读取,因此我已将其包含在我的编程语法中以检查命令行参数。 The code is below 代码如下

import sys

count={}

with open(sys.argv[1],'r') as f:
    for line in f:
        for word in line.split():
            if word not in count:
                count[word] = 1
            else:
                count[word] += 1

print(word,count[word])

file.close()

count is a dictionary to store the words and the number of times they occur. count是一个字典,用于存储单词及其出现次数。 I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences. 我希望能够打印出每个单词及其出现的次数,从大多数事件开始到最少出现。

I'd like to know if I'm on the right track, and if I'm using sys properly. 我想知道我是否在正确的轨道上,如果我正确使用系统。 Thank you!! 谢谢!!

What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. 你做的对我来说很好,也可以使用collections.Counter (假设你是python 2.7或更新版本)来获取更多的信息,比如每个单词的数量。 My solution would look like this, probably some improvement possible. 我的解决方案看起来像这样,可能会有一些改进。

import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
    for work in line.strip().split():
        c.update(work)
for ind in c:
    print ind, c[ind]

Your final print doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word . 您的最终print没有循环,因此它只会打印您读取的最后一个单词的计数,这仍然是word的值。

Also, with a with context manager, you don't need to close() the file handle. 此外,使用with context manager,您不需要close()文件句柄。

Finally, as pointed out in a comment, you'll want to remove the final newline from each line before you split . 最后,正如评论中指出的那样,您需要在split之前从每line删除最终换行符。

For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict from Collections to avoid the special case for initializing a new key in the dictionary. 对于像这样的简单程序,它可能不值得麻烦,但您可能希望查看Collections中的defaultdict以避免在字典中初始化新键的特殊情况。

I just noticed a typo: you open the file as f but you close it as file . 我刚刚注意到一个拼写错误:你打开文件为f但你把它关闭为file As tripleee said, you shouldn't close files that you open in a with statement. 正如tripleee所说,您不应该关闭在with语句中打开的文件。 Also, it's bad practice to use the names of builtin functions, like file or list , for your own identifiers. 此外,使用内置函数的名称(如filelist )作为您自己的标识符也是不好的做法。 Sometimes it works, but sometimes it causes nasty bugs. 有时它有效,但有时它会导致讨厌的错误。 And it's confusing for people who read your code; 对于阅读代码的人来说,这让人感到困惑; a syntax highlighting editor can help avoid this little problem. 语法高亮编辑器可以帮助避免这个小问题。

To print the data in your count dict in descending order of count you can do something like this: 要按照count的降序打印count字典中的数据,您可以执行以下操作:

items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)

See the Python Library Reference for more details on the list.sort() method and other handy dict methods. 有关list.sort()方法和其他方便的dict方法的更多详细信息,请参阅Python Library Reference。

I just did this by using re library. 我只是通过使用re库来做到这一点。 This was for average words in a text file per line but you have to find out number of words per line. 这是每行文本文件中的平均单词,但您必须找出每行的单词数。

import re
#this program get the average number of words per line
def main():
    try:
        #get name of file
        filename=input('Enter a filename:')

        #open the file
        infile=open(filename,'r')

        #read file contents
        contents=infile.read()
        line = len(re.findall(r'\n', contents))
        count = len(re.findall(r'\w+', contents))
        average = count // line

        #display fie contents
        print(contents)
        print('there is an average of', average, 'words per sentence')

        #closse the file
        infile.close()
    except IOError:
        print('An error oocurred when trying to read ')
        print('the file',filename )

#call main
main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM