简体   繁体   中英

How to match words that doesn't start nor end with certain characters using Regex?

I want to find word matches that doesn't start nor end with some specific characters.

For example, I have this input and I only want to match the highlighted word:

"string" string 'string'

And exclude other words that start and end with either " or ' .

I am currently using this pattern:

在此处输入图像描述

But I do not know what pattern I should use that would exclude words that start and end with some specified characters.

Can some one give me some advice on what pattern I should use? Thank you

The pattern you're currently using matches since \b properly asserts the positions between "s and g" (a position between a word character [a-zA-Z0-9_] and a non-word character). You can use one of the following methods:

  1. Negate specific characters (negative lookbehind/lookahead)
    • This method allows you to specify a character, set of characters, or substring to negate from a match.
    • (?<?['"])\bstring\b(?!['"]) - see it in use here
      • (?<!['"]) - ensure " doesn't precede.
      • (?!['"]) - ensure " doesn't proceeds.
  2. Allow specific characters (positive lookbehind/lookahead)
    • This method allows you to specify a character, set of characters, or substring to ensure match.
    • (?<=\s|^)\bstring\b(?=\s|$) - see it in use here
      • (?<=\s|^) - ensure whitespace or the beginning of the line precedes.
      • (?=\s|$) - ensure whitespace or the end of the line proceeds.
  3. A combination of both above
    • This method allows you to negate specific cases while allowing others (not commonly used and not really needed for the problem presented, but may be useful to you or others.
    • Something like (?<=\s|^)string(??\s+(?!stop)|$) would ensure the word isn't followed by the word stop
    • Something like (?<=(?<?stop\s*)\s+|^)string(?=\s+|$) would ensure the word doesn't follow the word stop - note that quantifiers ( \s+ ) in lookbehinds are not allowed in most regex engines, .NET allows it.
    • Something like (?<=\s|^)\bstring\b(?=\s|$)(?!\z) would ensure a the word isn't at the end of the string (different from end of line if multi-line).

This regex will pick string if it is between spaces: \sstring\s

var sample = "\"string\" string \"string\" astring 'string_ string?string string ";
var regx = new Regex(@"\sstring\s");
var matches = regx.Matches(sample);
foreach (Match mt in matches)
{
    Console.WriteLine($"{mt.Value} {mt.Index,3} {mt.Length,3}");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM