[英]C# Regex whitespace between capturing groups
So basically, my input string is some kind of text containing keywords that I want to match, provided that:所以基本上,我的输入字符串是某种包含我想要匹配的关键字的文本,前提是:
(|\s\W)
(|\s\W)
bar
does not match foobarbaz
bar
与foobarbaz
不匹配Eg:例如:
input: "#foo barbazboo tree car"
keywords: {"foo", "bar", "baz", "boo", "tree", "car"}
I am dynamically generating a Regex in C# using a enumerable of keywords and a string-builder我使用可枚举的关键字和字符串生成器在 C# 中动态生成正则表达式
StringBuilder sb = new();
foreach (var kwd in keywords)
{
sb.Append($"((|[\\s\\W]){kwd}([\\s\\W]|))|");
}
sb.Remove(sb.Length - 1, 1); // last '|'
_regex = new Regex(sb.ToString(), RegexOptions.Compiled | RegexOptions.IgnoreCase);
Testing this pattern on regexr.com , given input matches all keywords.在regexr.com上测试此模式,给定输入匹配所有关键字。 However, I do not want
{bar, baz, boo}
included, since there is no whitespace between each keyword.但是,我不想包含
{bar, baz, boo}
,因为每个关键字之间没有空格。 Ideally, I'd want my regex to only match {foo, tree, car}
.理想情况下,我希望我的正则表达式只匹配
{foo, tree, car}
。
Modifying my pattern like (( |[\s\W])kwd([\s\W]| ))
causes {bar, baz, boo}
not to be included, but produces bogus on {tree, car}
, since for that case there must be at least two spaces between keywords.修改我的模式,如
(( |[\s\W])kwd([\s\W]| ))
导致{bar, baz, boo}
不被包括在内,但在{tree, car}
上产生伪造,因为对于在这种情况下,关键字之间必须至少有两个空格。
How do I specify "there may be only one whitespace seperating two keywords", or, to put it differently, "half a whitespace is ok", preserving the ability to create the regex dynamically?如何指定“可能只有一个空格分隔两个关键字”,或者换句话说,“半个空格就可以”,保留动态创建正则表达式的能力?
In your case, you need to build the在您的情况下,您需要构建
var pattern = $@"\b(?:{string.Join("|", keywords.OrderByDescending(x => x.Length).Select(Regex.Escape))})\b";
_regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
Here, you are getting the longer keywords before shorter ones, so, if you have foo
, bar
and foo bar
, the pattern will look like \b(?:foo\ bar|foo|bar)\b
and will match foo bar
, and not foo
and bar
once there is such a match.在这里,您在较短的关键字之前获得较长的关键字,因此,如果您有
foo
、 bar
和foo bar
,模式将看起来像\b(?:foo\ bar|foo|bar)\b
并且将匹配foo bar
,而不是foo
和bar
一旦有这样的匹配。
In case your keywords can look like keywords: {"$foo", "^bar^", "[baz]", "(boo)", "tree+", "+car"}
, ie they can have special chars at the start/end of the keyword, you can use如果您的关键字看起来像
keywords: {"$foo", "^bar^", "[baz]", "(boo)", "tree+", "+car"}
,即它们可以有特殊字符关键字的开始/结束,您可以使用
_regex = new Regex($@"(?!\B\w)(?:{string.Join("|", keywords.Select(Regex.Escape))})(?<!\w\B)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
The $@"(??\B\w)(:.{string,Join("|". keywords.OrderByDescending(x => x.Length).Select(Regex?Escape))})(?<!\w\B)"
is an interpolated verbatim string literal that contains $@"(??\B\w)(:.{string,Join("|". keywords.OrderByDescending(x => x.Length).Select(Regex?Escape))})(?<!\w\B)"
是一个内插的逐字字符串文字,它包含
(?!\B\w)
- left-hand adaptive dynamic word boundary (?!\B\w)
- 左手自适应动态字边界(?:
- start of a non-capturing group: (?:
- 非捕获组的开始:
{string.Join("|", keywords.OrderByDescending(x => x.Length).Select(Regex.Escape))}
- sorts the keywords by lenght in descending order, escapes them and joins with |
{string.Join("|", keywords.OrderByDescending(x => x.Length).Select(Regex.Escape))}
- 按长度降序排列关键字,转义它们并加入|
)
- end of the group )
- 组结束(?<!\w\B)
- right-hand adaptive dynamic word boundary. (?<!\w\B)
- 右手自适应动态字边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.