简体   繁体   中英

Finding maximum and minimum number of words among the sentences in the input file

I have a question that ask me to find min and max number of the words in the text file. I've finished three of five questions and two left are asking for min and max values which I can not have any solution for that. Here's my code: thanks for your help

lines, blanklines, sentences, words  = 0, 0, 0, 0,
print '-' * 50
full_text = 'input.txt'
empty_text = 'output.txt'

text_file = open(full_text, 'r')
out_file = open(empty_text, "w")


for line in text_file:
  print line
  lines += 1

  if line.startswith('\n'):
    blanklines += 1
  else:
    # assume that each sentence ends with . or ! or ?

    # so simply count these characters

    sentences += line.count('.') + line.count('!') + line.count('?')


    # create a list of words

    # use None to split at any whitespace regardless of length

    # so for instance double space counts as one space

    # word total count

    words += len(line.split())
average = float(words) / float(sentences)



text_file.close()
out_file.close()

######## T E S T   P R O G R A M ########

print
print '-' * 50
print "Total number of sentences in the input file  : ", sentences
print "Total number of words in the input file      : ", words
print "Average number of words per sentence         : ", average

You can use regex for find words like this :

import re

for line in open(thefilepath):
 re_word = re.findall(r"[\w'-]+",line)
 sentences = re.split(r"\.",k)
 for s in sentence:
   words_in_sent=re.findall(r"[\w'-]+",k)
   summ+=len(word_in_sent)

print "Total number of sentences in the input file :{0}\n and Total number of words in the input file: {1}\n and average of words in each sentence is :{2} ".format(len(sentences),len(words),summ/len(sentences))

Use collecion.Counter , a data type for this purposes

>>> from collections import Counter
>>> lines="""
... foo bar baz hello world foo
... a b c z d
... 0 foo 1 bar"""
>>> counter = Counter()
>>> 
>>> for line in lines.split("\n"):
...     counter.update(line.split())
... 
>>> print counter.most_common(1) #print max
[('foo', 3)]
>>> print counter.most_common()[-1] #print min
('hello', 1)
>>> print len(list(counter.elements()))  #print total words
15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM