[英]Regex to limit words with specific combination of letters (in any order)
This one is a little complicated and somewhat out of my league. 这个有点复杂,有点偏离我的联赛。 I want to sort through a list of words and eliminate those that don't contain a specific set of characters, however those characters can be in any order and some may occur more than others. 我想对单词列表进行排序并删除那些不包含特定字符集的单词,但是这些字符可以按任何顺序排列,有些可能比其他字符更多。
I want the regex to look for any words with: 我希望正则表达式找到任何单词:
e
0 or 1 times e
0或1次
a
0 or 1 times a
0或1次
t
0 or 1 or 2 times t
0或1或2次
For example the following would work: 例如,以下方法可行:
eat
tea
tate
tt
a
e
eat
tea
tate
tt
a
e
The following would not work 以下不起作用
eats
teas
tates
ttt
aa
ee
eats
teas
tates
ttt
aa
ee
Lookaround Regex is new to me, so I'm not 100% sure on the syntax (any answer using a lookaround with an explanation would be awesome). Lookaround Regex对我来说是新手,所以我对语法并不是100%肯定(使用带有解释的环视的任何答案都会很棒)。 My best guess so far: 到目前为止我最好的猜测:
Regex regex = new Regex(@"(?=.*e)(?=.*a)(?=.*t)");
lines = lines.Where(x => regex.IsMatch(x)).ToArray(); //'text' is array containing words
Sure: 当然:
\b(?:e(?!\w*e)|t(?!(?:\w*t){2})|a(?!\w*a))+\b
Explanation: 说明:
\b # Start of word
(?: # Start of group: Either match...
e # an "e",
(?!\w*e) # unless another e follows within the same word,
| # or
t # a "t",
(?! # unless...
(?:\w*t){2} # two more t's follow within the same word,
) #
| # or
a # an "a"
(?!\w*a) # unless another a follows within the same word.
)+ # Repeat as needed (at least one letter)
\b # until we reach the end of the word.
Test it live on regex101.com . 在regex101.com上测试它。
(I've used the \\w
character class for simplicity's sake; if you want to define your allowed "word characters" differently, replace this accordingly) (为简单起见,我使用了\\w
字符类;如果要以不同方式定义允许的“字符”,请相应地替换它)
This is probably the same as the others, I haven't formatted those to find out. 这可能与其他人一样,我没有格式化那些找出来。
Note that assertions are coerced to match, they can't be optional 请注意,断言是强制匹配的,它们不能是可选的
(unless specifically set optional, but what for?) and are not directly affected by backtracking. (除非专门设置可选,但是为什么?)并且不会直接受到回溯的影响。
This works, explanation is in the formatted regex. 这是有效的,解释是在格式化的正则表达式。
updated 更新
To use a whitespace boundary, use this: 要使用空白边界,请使用以下命令:
(?<!\\S)(?!\\w*(?:e\\w*){2})(?!\\w*(?:a\\w*){2})(?!\\w*(?:t\\w*){3})[eat]+(?!\\S)
Formatted: 格式:
(?<! \S )
(?!
\w*
(?: e \w* ){2}
)
(?!
\w*
(?: a \w* ){2}
)
(?!
\w*
(?: t \w* ){3}
)
[eat]+
(?! \S )
To use an ordinary word boundary, use this: 要使用普通的单词边界,请使用:
\\b(?!\\w*(?:e\\w*){2})(?!\\w*(?:a\\w*){2})(?!\\w*(?:t\\w*){3})[eat]+\\b
Formatted: 格式:
\b # Word boundary
(?! # Lookahead, assert Not 2 'e' s
\w*
(?: e \w* ){2}
)
(?! # Lookahead, assert Not 2 'a' s
\w*
(?: a \w* ){2}
)
(?! # Lookahead, assert Not 3 't' s
\w*
(?: t \w* ){3}
)
# At this point all the checks pass,
# all thats left is to match the letters.
# -------------------------------------------------
[eat]+ # 1 or more of these, Consume letters 'e' 'a' or 't'
\b # Word boundary
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.