正则表达式中的前瞻和后视

Question

I want to print before and after 10 words of the matched word in the string.我想在字符串中匹配单词的 10 个单词之前和之后打印。

For example, I have例如，我有

string = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

In the above string, I want to search of letter experience and wants output like在上面的字符串中，我想搜索字母经验并希望输出如下

Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language"

I tried (\\S+)\\s+exp+ , but it only returns one before word.我试过(\\S+)\\s+exp+ ，但它只在单词之前返回一个。

Answer 1

Spliting the words on one or more whitespace chracters is probably the best approach:在一个或多个空白字符上拆分单词可能是最好的方法：

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    pass
else:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))

Prints:印刷：

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

But if you inisist on using a regular expression, then this should print up to 5 words preceding and 5 words following "experience":但是如果你坚持使用正则表达式，那么这应该打印最多 5 个单词前面和 5 个单词后面的“体验”：

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

m = re.search(r'([\w,;!.+-]+\s+){0,5}experience(\s+[\w,;!.+-]+){0,5}', s)
if m:
    print(m[0])

Prints:印刷：

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

Update to Handle "experience" or "Experience"更新以处理“经验”或“经验”

I have also simplified the regular expression:我还简化了正则表达式：

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

# By splitting on one or more whitespace characters:
words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    try:
        index = words.index('Experience')
    except Exception:
        index = None
if index:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))


# Using a regular expression:
m = re.search(r'(\S+\s+){0,5}[eE]xperience(\s+\S+){0,5}', s)
if m:
    print(m[0])

Prints:印刷：

-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine

Answer 2

you can start by separating your words with spaces, then select from the first 10 words until the end in your list and finally group this list to redo a string您可以先用空格分隔单词，然后从列表中的前 10 个单词中选择直到末尾，最后将该列表分组以重做字符串

 ts=string.split(' ')[10:]
 print(" ".join(ts))

Answer 3

Please try below regex请尝试以下正则表达式

((?:\S+\s){10})(experience)((?:\s\S+){10})

Here \\1 will have 10 words before and \\3 will have 10 words after 'experience'这里\\1之前会有 10 个词，而\\3 'experience' 之后会有 10 个词

Demo演示

正则表达式中的前瞻和后视

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-09-04 13:29:54

解决方案2
1 2020-09-04 13:05:01

解决方案3
1 2020-09-04 13:21:34

正则表达式中的前瞻和后视

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-09-04 13:29:54

解决方案2 1 2020-09-04 13:05:01

解决方案3 1 2020-09-04 13:21:34

解决方案1
2 已采纳 2020-09-04 13:29:54

解决方案2
1 2020-09-04 13:05:01

解决方案3
1 2020-09-04 13:21:34