简体   繁体   English

计算Python文本文件中的段落和最常用词

[英]Counting Paragraph and Most Frequent Words in Python Text File

I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. 我正在尝试计算文本文件(与此有关的任何文本文件)中的段落数和最常用的词,但是运行代码时似乎输出为零,也没有错误。 Any tips on where I'm going wrong? 关于我要去哪里的任何提示?

filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words 
wordcount={}
for word in inf.read().split():
 if word not in wordcount:
    wordcount[word] = 1
else:
    wordcount[word] += 1
for key in wordcount.keys():
    print ("%s %s " %(key , wordcount[key]))

#Count Paragraph(s)
linecount = 0
for i in inf:
   paragraphcount = 0
   if '\n' in i:
      linecount += 1
   if len(i) < 2: paragraphcount *= 0
   elif len(i) > 2: paragraphcount = paragraphcount + 1
   print('%-4d %4d %s' % (paragraphcount, linecount, i))  
inf.close()
filename = raw_input("enter file name: ")

wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:

    for line in ftext.readlines():
        if line in ('\n', '\r\n'):
            if linecount == 0:
                paragraphcount = paragraphcount + 1
            linecount = linecount + 1
        else:
            linecount = 0
            #frequent words
            for word in line.split():
                wordcount[word] = wordcount.get(word,0) + 1




print wordcount
print paragraphcount

When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. 当您读取文件时,会有一个光标指示当前正在读取哪个字节。 In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. 在您的代码中,您试图读取文件两次,并且遇到了奇怪的行为,这应该暗示您做错了什么。 To the solution, 对于解决方案,

What is the correct way ? 正确的方法是什么?

You should read the file once, store every line, then find word count and paragraph count, using the same store. 您应该阅读一次文件,存储每一行​​,然后使用同一存储库查找字数和段落数。 Rather than trying to reading it twice. 而不是尝试阅读两次。

What is happening is the current code ? 当前代码是怎么回事?

When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. 首次读取文件时,当您尝试读取行时,如果将字节游标设置为文件末尾,则它会返回一个空列表,因为它试图读取文件末尾。 You can corrent this by resetting the file pointer(the cursor). 您可以通过重置文件指针(光标)来解决此问题。

Call inf.seek(0) just before you try to read lines. 在尝试读取行之前,请调用inf.seek(0) But instead of this, you should be focusing on implementing a method I mentioned in the first section. 但是,除此以外,您应该专注于实现我在第一部分中提到的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM