I have to create a program to count the number of words from a text file.
So, my plan:
-user enters name of txt file,
-app load it into variable 'text',
-make it lowercase,
-search only words without signs like '/''#' and no whitespace etc. only alpha string
-make it into a list of words,
-show all the words, 1st should have the biggest number of uses, the last should be used at least 1 time
How to change that to include words only with minimum length +3? Example: in, on, at <- should not include list, word, appear, clear <- should be included.
from collections import Counter
import re
def open_file():
file_name = input("Enter a filename: ") # enter name of file which should be open
with open(file_name) as f: # it should exist in project folder
text = f.read() # load file into var text
f.close() # close the file
return text
try:
text = open_file() # open file and write it into var
except FileNotFoundError:
print("File was not found!")
text = "" # if FileNotFoundError = True -> text = none
lower_text = text.lower() # transform txt into lower cases
text_with_out_special_signs = re.findall(r'[a-z]*', lower_text) #delete signs like =,#,!
counts_of_words = Counter(text_with_out_special_signs) # transform list in Counter
for x in counts_of_words.most_common(): # show results
print(x)
If you want to remove words with less than 3 characters you could do something like this:
text_more_than_3_char_words = [w for w in text_with_out_special_signs if len(w) > 2]
counts_of_words = Counter(text_more_than_3_char_words) # transform list in Counter
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.