简体   繁体   English

为什么以下否定前瞻不起作用

[英]Why is the following negative lookahead is not working

import re
txt =  'harry potter is awsome  so is harry james potter'
pat = '\W+(?!potter)'
re.findall(pat,txt)

according to my understanding the the output should have been all the words that are not followed by potter that is 根据我的理解,输出应该是所有陶瓷工都没有遵循的词

['potter', 'is', 'awsome', 'so', 'is', 'harry', 'james', 'potter'] ['potter','是','awsome','so','是','harry','james','potter']

but the actual output is 但实际输出是

['harry', 'potter', 'is', 'awsome', 'so', 'is', 'harry', 'james', 'potter'] ['harry','potter','是','awsome','so','是','harry','james','potter']

why is the pattern also matching the harry that is followed by potter ? 为什么这个模式也与波特所遵​​循的哈利相匹配?

because " potte" doesn't match "potter" . 因为" potte""potter"不匹配。

>>> txt = 'harry potter is awsome  so is harry james potter'
>>> pat = '(\w+)(?:\W|\Z)(?!potter)'
>>> re.findall(pat,txt)
['potter', 'is', 'awsome', 'so', 'is', 'harry', 'potter']

according to my understanding the the output should have been all the words that are not followed by potter 根据我的理解,输出应该是波特没有遵循的所有词语

It does. 确实如此。 The thing is, every word is not followed by potter , because every word, by definition, is followed by either whitespace or the end of the string. 问题是,每个单词都不是potter遵循的,因为根据定义,每个单词后跟空格或字符串的结尾。

import re

txt =  txt =  'harry potter is awsome  so is harry james potter'

pat = r'\w+\b(?![\ ]+potter)'

print re.findall(pat,txt)

I get this result: 我得到这个结果:

[' ', ' ', '  ', ' ', ' ', ' ']

...which is exactly what I expect. ......这正是我所期待的。 \\W+ (note the uppercase W ) matches one or more non-word characters, so \\W+(?!potter) matches the whitespace between the words in your input, except when the upcoming word starts with "potter". \\W+ (注意大写的W )匹配一个或多个非单词字符,所以\\W+(?!potter)匹配输入中单词之间的空白,除非即将出现的单词以“potter”开头。 If I wanted to match each word that's not followed by the word "potter" I would use this regex: 如果我想匹配每个没有后跟单词“potter”的单词,我会使用这个正则表达式:

pat = r'\b\w+\b(?!\W+potter\b)'

\\b matches a word boundary ; \\b匹配单词边界 ; the first two insure that I'm matching a whole word, and the last one makes sure the upcoming word is "potter" and not a longer word that starts with "potter". 前两个确保我匹配整个单词,最后一个确保即将到来的单词是“potter”而不是以“potter” 开头的更长的单词。

Notice how I used raw string ( r'...' ). 注意我是如何使用原始字符串( r'...' )的。 You should get in the habit of using them for all your regexes in Python. 你应该养成在Python中使用它们的所有正则表达式的习惯。 In this case, \\b would be interpreted as a backspace character if I had used a normal string. 在这种情况下,如果我使用普通字符串, \\b将被解释为退格字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM