I have a folder which contains some other folders and these folders contain some text files. (The language is Persian). I want to print 5 words before and after a keyword with the keyword in the middle of them. I wrote the code, but it gives the 5 words in the start and the end of the line and not the words around the keyword. How can I fix it?
Hint: I just write the end of the code which relates to the question above. The start of the code is about the opening and normalizing the files.
def c ():
y = "آرامش"
text= normal_text(folder_path) # the first function to open and normalize the files
for i in text:
for line in i:
if y in line:
z = line.split()
print (z[-6], z[-5],
z[-4], z[-3],
z[-2], z[-1], y,
z[+1], z[+2],
z[+3], z[+4],
z[+5], z[+6])
what I expect is something like this:
word word word word word keyword word word word word word
Each sentence in a new line.
You need to get the words indices based on your keyword's index. You can use list.index()
method in order to get the intended index, then use a simple indexing to get the expected words:
for f in normal_text(folder_path):
for line in f:
if keyword in line:
words = line.split()
ins = words.index(keyword)
print words[max(0, ind-5):min(ind+6, len(words))]
Or as a more optimized approach you can use a generator function in order to produce the words as an iterator which is very much optimized in terms of memory usage.
def get_words(keyword):
for f in normal_text(folder_path):
for line in f:
if keyword in line:
words = line.split()
ins = words.index(keyword)
yield words[max(0, ind-5):min(ind+6, len(words))]
Then you can simply loop over the result for print or etc.
y = "آرامش"
for words in get_words(y):
# do stuff
Try this. It splits the words. Then it calculates the amount to show before and after (with a minimum of however much is left, and a maximum of 5) and shows it.
words = line.split()
if y in words:
index = words.index(y)
before = index - min(index, 5)
after = index + min( len(words) - 1 - index, 5) + 1
print (words[before:after])
def c():
y = "آرامش"
text= normal_text(folder_path) # the first function to open and normalize the files
for i in text:
for line in i:
split_line = line.split()
if y in split_line:
index = split_line.index(y)
print (' '.join(split_line[max(0,index-5):min(index+6,le
n(split_line))]))
Assuming the keyword must be an exact word.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.