使用關鍵字在Python中打印句子

Question

您好我正在編寫一個Python程序，它讀取給定的.txt文件並查找關鍵字。 在這個程序中，一旦我找到了我的關鍵字（例如'data' ），我想打印出與該詞相關聯的整個句子。

我已經在輸入文件中讀取並使用split()方法去除空格，制表符和換行符，並將所有單詞放入數組中。

這是我到目前為止的代碼。

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

for token in lines:
    if token == keyword:
         //I have found my keyword, what methods can I use to
        //print out the words before and after the keyword 
       //I have a feeling I want to use '.' as a marker for sentences
           print(sentence) //prints the entire sentence

file.txt閱讀如下

Welcome to SOF! This website securely stores data for the user.

期望的輸出：

This website securely stores data for the user.

Answer 1

我們可以在表示行結尾的字符上拆分文本，然后循環遍歷這些行並打印包含我們關鍵字的那些行。

要在多個字符上拆分文本，例如行結尾可以標記為! ? . ! ? . 我們可以使用正則表達式：

import re

keyword = "data"
line_end_chars = "!", "?", "."
example = "Welcome to SOF! This website securely stores data for the user?"
regexPattern = '|'.join(map(re.escape, line_end_chars))
line_list = re.split(regexPattern, example)

# line_list looks like this:
# ['Welcome to SOF', ' This website securely stores data for the user', '']

# Now we just need to see which lines have our keyword
for line in line_list:
    if keyword in line:
        print(line)

但請記住： if keyword in line:匹配一系列字符，不一定是整個單詞 - 例如，'datamine'中的'data'為True。 如果你只想匹配整個單詞，你應該使用正則表達式：源代碼說明

正則表達式分隔符的來源

Answer 2

我的方法類似於Alberto Poljak，但更明確一點。

的動機是為了實現對單詞拆分是不必要的- Python的in運營商在一個句子里會很樂意找一個字。 什么是必要的分裂句子。 不幸的是，句子可以結束. ， ? 或者! 和Python的split函數不允許多個分隔符。 所以我們必須有點復雜並使用re 。

re要求我們放一個| 在每個分隔符之間並且逃避它們中的一些，因為兩者. 和? 默認具有特殊含義。 使用阿爾貝托的解決方案re自己做這一切，這肯定是要走的路。 但是，如果你是新來的re ，我硬編碼的版本可能更清晰。

我做的另一個補充是將每個句子的尾隨分隔符放回它所屬的句子上。 為此，我將分隔符包裝在() ，它在輸出中捕獲它們。 然后我用zip將它們放回到他們來自的句子上。 0::2和1::2切片將采用每個偶數索引（句子）並將它們與每個奇數索引（分隔符）連接起來。 取消注釋print語句以查看正在發生的情況。

import re

lines = "Welcome to SOF! This website securely stores data for the user. Another sentence."
keyword = "data"

sentences = re.split('(\.|!|\?)', lines)

sentences_terminated = [a + b for a,b in zip(sentences[0::2], sentences[1::2])]

# print(sentences_terminated)

for sentence in sentences_terminated:
    if keyword in sentence:
        print(sentence)
        break

輸出：

 This website securely stores data for the user.

Answer 3

此解決方案使用一個相當簡單的正則表達式，以便在一個句子中找到您的關鍵字，其中包含可能在其之前和之后的單詞，以及最終句點字符。 它適用於空格，它只是re.search()一次執行。

import re

text_file = open("file.txt", "r")
text = text_file.read()

keyword = 'data'

match = re.search("\s?(\w+\s)*" + keyword + "\s?(\w+\s?)*.", text)
print(match.group().strip())

Answer 4

另一種方案：

def check_for_stop_punctuation(token):
    stop_punctuation = ['.', '?', '!']
    for i in range(len(stop_punctuation)):
        if token.find(stop_punctuation[i]) > -1:
            return True
    return False

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

sentence = []
stop_punctuation = ['.', '?', '!']

i = 0
while i < len(lines):
    token = lines[i]
    sentence.append(token)
    if token == keyword:
        found_stop_punctuation = check_for_stop_punctuation(token)
        while not found_stop_punctuation:
            i += 1
            token = lines[i]
            sentence.append(token)
            found_stop_punctuation = check_for_stop_punctuation(token)
        print(sentence)
        sentence = []
    elif check_for_stop_punctuation(token):
        sentence = []
    i += 1

使用關鍵字在Python中打印句子

問題描述

4 個解決方案

解決方案1
2 已采納 2019-04-06 21:27:14

解決方案2
2 2019-04-06 23:05:47

解決方案3
1 2019-04-06 22:00:35

解決方案4
0 2019-04-06 21:33:13

使用關鍵字在Python中打印句子

問題描述

4 個解決方案

解決方案1 2 已采納 2019-04-06 21:27:14

解決方案2 2 2019-04-06 23:05:47

解決方案3 1 2019-04-06 22:00:35

解決方案4 0 2019-04-06 21:33:13

解決方案1
2 已采納 2019-04-06 21:27:14

解決方案2
2 2019-04-06 23:05:47

解決方案3
1 2019-04-06 22:00:35

解決方案4
0 2019-04-06 21:33:13