简体   繁体   中英

Python extract sentence after a keyword is found

I have a string based on some text I have extracted and a list of keywords. I woud like to run through the string and extract only the sentence after the sentence where the keyword is found and remove the full stop too.

String

'Test string. removing data. keyword extraction. data number. 11123. final answer.'

Here is my list of key phrases:

lst= ['Test string', 'data number']

Desired output:

['removing data', '11123']

Please could someone help me out/ point in the right direction? Thanks

Here is my suggestion:

s='Test string. removing data. keyword extraction. data number. 11123. final answer.'

temp = [i.strip() for i in s.split('.')]

res = [temp[temp.index(i)+1] for i in lst]

print(res)

Output:

['removing data', '11123']

What it does:

temp = [i.strip() for i in s.split('.')]

s.split('.') converts your string in list of strings, split by dot. So you are getting each sentence separated:

['Test string', ' removing data', ' keyword extraction', ' data number', ' 11123', ' final answer', '']

This is put in a list comprehension , which creates a new list from the above one with stripped values ( i.strip() removes the leading and trailing whitespaces). So you end up with:

['Test string', 'removing data', 'keyword extraction', 'data number', '11123', 'final answer', '']

On the last step there are two interesting things:

  1. we use the list.index() method, which gives us the index of the searched item. Than it is easy to get the next element.
  2. This is fast when you have a big string and few search items, but you should be careful, because it will fail if you are searching for a non-existing item.

It is safer to make it straight forward:

res = [temp[idx+1] for idx, val in enumerate(temp) if val in lst]

For more information on enumerate, check the documentation .

Here's one solution. Essentially you split the input based on the dot and space to make a list. Then you iterate over and see if it exists. If it does, you add the next element to your output list.

Code:

input = 'Test string. removing data. keyword extraction. data number. 11123. final answer.'
input_as_list = input.split('. ')
lst = ['Test string', 'data number']
result = []
for i in range(0, len(input_as_list)):
    for item in lst:
        if input_as_list [i] == item :
            result.append(input_as_list [i+1])
print(result)

Result:

['removing data', '11123']

Use list comprehension , re.split and enumerate :

import re
my_str = 'Test string. removing data. keyword extraction. data number. 11123. final answer.'
key_phrases = ['Test string', 'data number']
my_str_phrases = re.split(r'[.]\s*', my_str)
print([my_str_phrases[idx + 1] for idx, item in enumerate(my_str_phrases) if item in key_phrases])
# ['removing data', '11123']

Note:
[.]\s* : Literal dot (needs to be either part of the character class [] or escaped like this: .), followed by 0 or more occurrences of whitespace.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM