简体   繁体   中英

Whats wrong with this regular expression?

I have the following regular expression to find word in text and highlight them

Using the word surface for testing purposes.

/((?<=[\W])surface?(?![\w]))|((?<![\w])surface?(?=[\W]))/iu

It matches all occurences in the following text.

surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_surface_Tare surface_revC.pdf

But if i change the first occurence of surface to contain a upper case letter, it only matches the first occurence.

Surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_surface_Tare surface_revC.pdf

Or if i put an upper case letter in some of the other occurences it matches that.

Surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_Surface_Tare surface_revC.pdf

I have no idea what you're trying to achieve there, but possibly your problem is that \\w will include _ (and \\W will exclude it).

Maybe try this:

/(?<![a-z])surface(?![a-z])/iu

Or this:

/(?<=[\W_])surface(?=[\W_])/iu

Otherwise, please provide more details on what exactly you do/don't want to match.


Update: given this information:

surface2010 should not be matched

In that case, I suspect you want:

/(?<=\b|_)surface(?=\b|_)/iu

(since just \\b would exclude a match containing "...and_surface_Tare..." so we add the alternation with _ to include that.)

我想念什么吗?

/\bsurface\b/i

So you want to match surface case-insensitively unless it's preceded or followed immediately by a letter or digit? Try this:

/(?<![A-Za-z0-9])surface(?![A-Za-z0-9])/i

I left off the /u modifier (which causes the regex and the subject string to be treated as UTF-8) because you appear to be dealing with pure ASCII text. \\w , \\W and \\b are not affected by /u anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM