简体   繁体   中英

Regex match if not before and after

How can I match 'suck' only if not part of 'honeysuckle'?

Using lookbehind and lookahead I can match suck if not 'honeysuck' or 'suckle', but it also fails to catch something like 'honeysucker'; here the expression should match, because it doesn't end in le :

re.search(r'(?<!honey)suck(?!le)', 'honeysucker')

You need to nest the lookaround assertions:

>>> import re
>>> regex = re.compile(r"(?<!honey(?=suckle))suck")
>>> regex.search("honeysuckle")
>>> regex.search("honeysucker")
<_sre.SRE_Match object at 0x00000000029B6370>
>>> regex.search("suckle")
<_sre.SRE_Match object at 0x00000000029B63D8>
>>> regex.search("suck")
<_sre.SRE_Match object at 0x00000000029B6370>

An equivalent solution would be suck(?!(?<=honeysuck)le) .

here is a solution without using regular expressions:

s = s.replace('honeysuckle','')

and now:

re.search('suck',s)

and this would work for any of these strings : honeysuckle sucks , this sucks and even regular expressions suck .

I believe you should separate your exceptions in a different Array, just in case in the future you wish to add a different rule. This will be easier to read, and will be faster in the future to change if needed.

My suggestion in Ruby is:

words = ['honeysuck', 'suckle', 'HONEYSUCKER', 'honeysuckle']

EXCEPTIONS = ['honeysuckle']

def match_suck word
  if (word =~ /suck/i) != nil
    # should not match any of the exceptions
    return true unless EXCEPTIONS.include? word.downcase
  end
  false
end

words.each{ |w|
  puts "Testing match of '#{w}' : #{match_suck(w)}"
}
>>>string = 'honeysucker'
>>>print 'suck' in string
True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM