Need to write a Python program that analyzes a file and counts:
I've got the code to do the first 2 things:
with open(input('Please enter the full name of the file: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
total_w = len(w)
avg_w = sum(w)/total_w
print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)
But I'm not sure on how to do the others. Any help is appreciated.
Btw, when I say "How many words start with each letter of the alphabet" I mean how many words start with "A", how many start with "B", how many start with "C", etc all the way through to "Z".
Interesting challenge you were given, i made a proposition for question 3, how many times a word occurs inside the string. This code is not optimal at all, but it does work.
also i used the file
text.txt
edit: noticed i forgot to create wordlist as it was saved in my ram memory
with open('text.txt', 'r') as doc:
print('opened txt')
for words in doc:
wordlist = words.split()
for numbers in range(len(wordlist)):
for inner_numbers in range(len(wordlist)):
if inner_numbers != numbers:
if wordlist[numbers] == wordlist[inner_numbers]:
print('word: %s == %s' %(wordlist[numbers], wordlist[inner_numbers]))
Answer to question four: This one wasn't really hard after you have created a list with all the words since strings can be treated like a list and you can easily get the first letter of the string by simply doing
string[0]
and if its a list with stringsstringList[position of word][0]
for numbers in range(len(wordlist)):
if wordlist[numbers][0] == 'a':
print(wordlist[numbers])
There are many ways to achieve this, a more advanced approach would involve an initial simple gathering of the text and its words, then working on the data with ML/DS tools, with which you could extrapolate more statistics (Things like "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" etc.)
If you just need very basic statistics you can gather them while iterating over each word and do the calculations at the end of it, like:
stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}
with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]
# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)
# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1
# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1
# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']
Online Demo here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.