I have a question that ask me to find min and max number of the words in the text file. I've finished three of five questions and two left are asking for min and max values which I can not have any solution for that. Here's my code: thanks for your help
lines, blanklines, sentences, words = 0, 0, 0, 0,
print '-' * 50
full_text = 'input.txt'
empty_text = 'output.txt'
text_file = open(full_text, 'r')
out_file = open(empty_text, "w")
for line in text_file:
print line
lines += 1
if line.startswith('\n'):
blanklines += 1
else:
# assume that each sentence ends with . or ! or ?
# so simply count these characters
sentences += line.count('.') + line.count('!') + line.count('?')
# create a list of words
# use None to split at any whitespace regardless of length
# so for instance double space counts as one space
# word total count
words += len(line.split())
average = float(words) / float(sentences)
text_file.close()
out_file.close()
######## T E S T P R O G R A M ########
print
print '-' * 50
print "Total number of sentences in the input file : ", sentences
print "Total number of words in the input file : ", words
print "Average number of words per sentence : ", average
You can use regex
for find words like this :
import re
for line in open(thefilepath):
re_word = re.findall(r"[\w'-]+",line)
sentences = re.split(r"\.",k)
for s in sentence:
words_in_sent=re.findall(r"[\w'-]+",k)
summ+=len(word_in_sent)
print "Total number of sentences in the input file :{0}\n and Total number of words in the input file: {1}\n and average of words in each sentence is :{2} ".format(len(sentences),len(words),summ/len(sentences))
Use collecion.Counter
, a data type for this purposes
>>> from collections import Counter
>>> lines="""
... foo bar baz hello world foo
... a b c z d
... 0 foo 1 bar"""
>>> counter = Counter()
>>>
>>> for line in lines.split("\n"):
... counter.update(line.split())
...
>>> print counter.most_common(1) #print max
[('foo', 3)]
>>> print counter.most_common()[-1] #print min
('hello', 1)
>>> print len(list(counter.elements())) #print total words
15
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.