简体   繁体   中英

Using a keyword to print a sentence in Python

Hello I am writing a Python program that reads through a given .txt file and looks for keywords. In this program once I have found my keyword (for example 'data' ) I would like to print out the entire sentence the word is associated with.

I have read in my input file and used the split() method to rid of spaces, tabs and newlines and put all the words into an array.

Here is the code I have thus far.

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

for token in lines:
    if token == keyword:
         //I have found my keyword, what methods can I use to
        //print out the words before and after the keyword 
       //I have a feeling I want to use '.' as a marker for sentences
           print(sentence) //prints the entire sentence

file.txt Reads as follows

Welcome to SOF! This website securely stores data for the user.

desired output:

This website securely stores data for the user.

We can just split text on characters that represent line endings and then loop trough those lines and print those who contain our keyword.

To split text on multiple characters , for example line ending can be marked with ! ? . ! ? . we can use regex:

import re

keyword = "data"
line_end_chars = "!", "?", "."
example = "Welcome to SOF! This website securely stores data for the user?"
regexPattern = '|'.join(map(re.escape, line_end_chars))
line_list = re.split(regexPattern, example)

# line_list looks like this:
# ['Welcome to SOF', ' This website securely stores data for the user', '']

# Now we just need to see which lines have our keyword
for line in line_list:
    if keyword in line:
        print(line)

But keep in mind that: if keyword in line: matches a sequence of characters, not necessarily a whole word - for example, 'data' in 'datamine' is True. If you only want to match whole words, you ought to use regular expressions: source explanation with example

Source for regex delimiters

My approach is similar to Alberto Poljak but a little more explicit.

The motivation is to realise that splitting on words is unnecessary - Python's in operator will happily find a word in a sentence. What is necessary is the splitting of sentences. Unfortunately, sentences can end with . , ? or ! and Python's split function does not allow multiple separators. So we have to get a little complicated and use re .

re requires us to put a | between each delimiter and escape some of them, because both . and ? have special meanings by default. Alberto's solution used re itself to do all this, which is definitely the way to go. But if you're new to re , my hard-coded version might be clearer.

The other addition I made was to put each sentence's trailing delimiter back on the sentence it belongs to. To do this I wrapped the delimiters in () , which captures them in the output. I then used zip to put them back on the sentence they came from. The 0::2 and 1::2 slices will take every even index (the sentences) and concatenate them with every odd index (the delimiters). Uncomment the print statement to see what's happening.

import re

lines = "Welcome to SOF! This website securely stores data for the user. Another sentence."
keyword = "data"

sentences = re.split('(\.|!|\?)', lines)

sentences_terminated = [a + b for a,b in zip(sentences[0::2], sentences[1::2])]

# print(sentences_terminated)

for sentence in sentences_terminated:
    if keyword in sentence:
        print(sentence)
        break

Output:

 This website securely stores data for the user.

This solution uses a fairly simple regex in order to find your keyword in a sentence, with words that may or may not be before and after it, and a final period character. It works well with spaces and it's only one execution of re.search() .

import re

text_file = open("file.txt", "r")
text = text_file.read()

keyword = 'data'

match = re.search("\s?(\w+\s)*" + keyword + "\s?(\w+\s?)*.", text)
print(match.group().strip())

Another Solution:

def check_for_stop_punctuation(token):
    stop_punctuation = ['.', '?', '!']
    for i in range(len(stop_punctuation)):
        if token.find(stop_punctuation[i]) > -1:
            return True
    return False

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

sentence = []
stop_punctuation = ['.', '?', '!']

i = 0
while i < len(lines):
    token = lines[i]
    sentence.append(token)
    if token == keyword:
        found_stop_punctuation = check_for_stop_punctuation(token)
        while not found_stop_punctuation:
            i += 1
            token = lines[i]
            sentence.append(token)
            found_stop_punctuation = check_for_stop_punctuation(token)
        print(sentence)
        sentence = []
    elif check_for_stop_punctuation(token):
        sentence = []
    i += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM