Making a concordance program. I want it to tell me what sentence a word is in, so if I have:
"Hello world. My name is Nathan and I need help on Python. I am very confused and any help is appreciated."
I want it to print which sentence each word comes from. I already have completed that it counts the total number of times each word appears and next to it I need the sentence number(s) it comes from, so it displays as:
a. word {word appearance count:sentence number}
with 'a.' working as the list order (like a numbered list but with letters). An example from the first sentence would be
a. help {2:2,3}
Here's the code I currently have:
word_counter = {}
sent_num = {}
linenum = 0
wordnum = 0
counter = 0
#not working
for word in f.lower().split('.'):
if not word in sent_num:
sent_num[word] = []
sent_num[word].append(f.find(wordnum))
#working correctly
for word in f.lower().split():
if not word in word_counter:
word_counter[word] = []
#if the word isn't listed yet, adds it
word_counter[word].append(linenum)
for key in sorted(word_counter):
counter += 1
print (counter, key, len(word_counter[key]), len(sent_num[key]))
In your code, when you look at full sentences, you are only splitting on '.'
. You need to split each sentence into words, after that:
for sentence in f.split('.'):
for word in sentence.lower().split():
if not word in sent_num:
sent_num[word] = []
sent_num[word].append(f.find(wordnum))
or something along those ways, depending on what you want to look at and count.
It's pretty simple to iterate over each sentence then each word in that sentence and create a dictionary that maps {word: [sentence, ...]}
:
In [1]:
d = {}
for i, sent in enumerate(f.lower().split('. ')):
for w in sent.strip().split():
d.setdefault(w, []).append(i)
d
Out[1]:
{'am': [2],
'and': [1, 2],
'any': [2],
'appreciated.': [2],
'confused': [2],
'hello': [0],
'help': [1, 2],
...}
Given the list is all the occurrences of the word then you can just get the count by call len()
, eg:
In [2]:
len(d['help'])
Out[2]:
2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.