简体   繁体   English

正则表达式中的前瞻和后视

[英]lookahead and lookbehind in regular expression

I want to print before and after 10 words of the matched word in the string.我想在字符串中匹配单词的 10 个单词之前和之后打印。

For example, I have例如,我有

string = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

In the above string, I want to search of letter experience and wants output like在上面的字符串中,我想搜索字母经验并希望输出如下

Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language"

I tried (\\S+)\\s+exp+ , but it only returns one before word.我试过(\\S+)\\s+exp+ ,但它只在单词之前返回一个。

Spliting the words on one or more whitespace chracters is probably the best approach:在一个或多个空白字符上拆分单词可能是最好的方法:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    pass
else:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))

Prints:印刷:

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

But if you inisist on using a regular expression, then this should print up to 5 words preceding and 5 words following "experience":但是如果你坚持使用正则表达式,那么这应该打印最多 5 个单词前面和 5 个单词后面的“体验”:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

m = re.search(r'([\w,;!.+-]+\s+){0,5}experience(\s+[\w,;!.+-]+){0,5}', s)
if m:
    print(m[0])

Prints:印刷:

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

Update to Handle "experience" or "Experience"更新以处理“经验”或“经验”

I have also simplified the regular expression:我还简化了正则表达式:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

# By splitting on one or more whitespace characters:
words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    try:
        index = words.index('Experience')
    except Exception:
        index = None
if index:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))


# Using a regular expression:
m = re.search(r'(\S+\s+){0,5}[eE]xperience(\s+\S+){0,5}', s)
if m:
    print(m[0])

Prints:印刷:

-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine

you can start by separating your words with spaces, then select from the first 10 words until the end in your list and finally group this list to redo a string您可以先用空格分隔单词,然后从列表中的前 10 个单词中选择直到末尾,最后将该列表分组以重做字符串

 ts=string.split(' ')[10:]
 print(" ".join(ts))

Please try below regex请尝试以下正则表达式

((?:\S+\s){10})(experience)((?:\s\S+){10})

Here \\1 will have 10 words before and \\3 will have 10 words after 'experience'这里\\1之前会有 10 个词,而\\3 'experience' 之后会有 10 个词

Demo演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM