简体   繁体   中英

C# Regex - only match if substring exists?

Ok, so I think I've got a handle on negation - now what about only selecting a match that has a specified substring within it?

Given:

This is a random bit of information from 0 to 1.
  This is a non-random bit of information I do NOT want to match
This is the end of this bit

This is a random bit of information from 0 to 1.
  This is a random bit of information I do want to match
This is the end of this bit

And attempting the following regex:

/(?s)This is a random bit(?:(?=This is a random).)*?This is the end/g

Why isn't this working? What am I missing?

I'm using regexstorm.com for testing...

You ruined a tempered greedy token by turning the negative lookahead into a positive one. It won't work that way because the positive lookahead requires the text to equal This is a random at each position after This is a random bit .

You need:

  • Match the leading delimiter ( This is a random bit )
  • Match all 0+ text that is not the leading/closing delimiters and not the required random text inside this block
  • Match the specific string inside ( This is a random )
  • Match all 0+ text that is not the leading/closing delimiters
  • Match the closing delimiter ( This is the end )

So, use

(?s)This is a random bit(?:(?!This is a random bit|This is the end|This is a random).)*This is a random(?:(?!This is a random bit|This is the end).)*This is the end

See the regex demo

  • (?s) - DOTALL mode on ( . matches a newline)
  • This is a random bit - Leading delimiter
  • (?: # Start of the tempered greedy token (?!This is a random bit # Leading delimiter | This is the end # Trailing delimiter | This is a random) # Sepcific string inside . # Any character )* # End of tempered greedy token
  • This is a random - specified substring
  • (?:(?!This is a random bit|This is the end).)* - Another tempered greedy token matching any text not leading/closing delimiters up to the first...
  • This is the end - trailing delimiter

I hope you understand this (?:(?=This is a random).) can only match once, never twice if it were quantified. For example Th can satisfy the lookahead. When the T is consumed, the next character is h which will never satisfy the lookahhead Th . The next expression is evaluated, never to return to the lookahead again. Use a negative lookahead instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM