Im trying to get this code to find say 30 words before a specific word and 30 words after. then i want it to writ'e my output to a new file. i cant seem to figure out what i'm doing wrong as im pretty new to python. any suggestions are more than welcome.
def extract_text(file_name, to_find):
file_in = open('School.txt', 'r')
all_lines = file_in.readlines()
file_in.close()
new_text = all_text.replace ('\n', '|')
width = 30
to_find = 'boy'
new_text = all_text.replace ('\n', '|')
while new_text.find(to_find) != -1:
start = all_text.find(to_find)
begin = start - width
end = start + len(to_find) + width
print(new_text[begin:end])
out_put = new_text[begin:end]
f = open("School_boy.txt","w")
f.write(out_put)
f.close()
For text parsing, I would recommend using regex:
import re
# Read the File
with open("file.txt", "r") as file:
text = file.read()
# replace newline with blank
text.replace('\n', '')
# parse the text
result = re.findall(r'(?P<before>\w+ ){30}target(P?<after>\w+ ){30}', text)
From there, all 30 words before are in a group called 'before' and all 30 words after are in a group called 'after' the target word -- in this example 'target'. RegEx can be really specific or really generic, depending on the pattern used. For example, the code above only allows for one space after a word and no punctuation. For a guide on python regex: https://docs.python.org/3/howto/regex.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.