简体   繁体   English

正则表达式限制具有特定字母组合的单词(以任何顺序)

[英]Regex to limit words with specific combination of letters (in any order)

This one is a little complicated and somewhat out of my league. 这个有点复杂,有点偏离我的联赛。 I want to sort through a list of words and eliminate those that don't contain a specific set of characters, however those characters can be in any order and some may occur more than others. 我想对单词列表进行排序并删除那些不包含特定字符集的单词,但是这些字符可以按任何顺序排列,有些可能比其他字符更多。

I want the regex to look for any words with: 我希望正则表达式找到任何单词:

e 0 or 1 times e 0或1次
a 0 or 1 times a 0或1次
t 0 or 1 or 2 times t 0或1或2次

For example the following would work: 例如,以下方法可行:

eat tea tate tt a e eat tea tate tt a e

The following would not work 以下不起作用

eats teas tates ttt aa ee eats teas tates ttt aa ee

Lookaround Regex is new to me, so I'm not 100% sure on the syntax (any answer using a lookaround with an explanation would be awesome). Lookaround Regex对我来说是新手,所以我对语法并不是100%肯定(使用带有解释的环视的任何答案都会很棒)。 My best guess so far: 到目前为止我最好的猜测:

Regex regex = new Regex(@"(?=.*e)(?=.*a)(?=.*t)");
lines = lines.Where(x => regex.IsMatch(x)).ToArray(); //'text' is array containing words

Sure: 当然:

\b(?:e(?!\w*e)|t(?!(?:\w*t){2})|a(?!\w*a))+\b

Explanation: 说明:

\b             # Start of word
(?:            # Start of group: Either match...
 e             # an "e",
 (?!\w*e)      # unless another e follows within the same word,
|              # or
 t             # a "t",
 (?!           # unless...
  (?:\w*t){2}  # two more t's follow within the same word,
 )             # 
|              # or
 a             # an "a"
 (?!\w*a)      # unless another a follows within the same word.
)+             # Repeat as needed (at least one letter)
\b             # until we reach the end of the word.

Test it live on regex101.com . 在regex101.com上测试它。

(I've used the \\w character class for simplicity's sake; if you want to define your allowed "word characters" differently, replace this accordingly) (为简单起见,我使用了\\w字符类;如果要以不同方式定义允许的“字符”,请相应地替换它)

This is probably the same as the others, I haven't formatted those to find out. 这可能与其他人一样,我没有格式化那些找出来。

Note that assertions are coerced to match, they can't be optional 请注意,断言是强制匹配的,它们不能是可选的
(unless specifically set optional, but what for?) and are not directly affected by backtracking. (除非专门设置可选,但是为什么?)并且不会直接受到回溯的影响。

This works, explanation is in the formatted regex. 这是有效的,解释是在格式化的正则表达式。

updated 更新
To use a whitespace boundary, use this: 要使用空白边界,请使用以下命令:

(?<!\\S)(?!\\w*(?:e\\w*){2})(?!\\w*(?:a\\w*){2})(?!\\w*(?:t\\w*){3})[eat]+(?!\\S)

Formatted: 格式:

 (?<! \S )
 (?!
      \w* 
      (?: e \w* ){2}
 )
 (?!
      \w* 
      (?: a \w* ){2}
 )
 (?!
      \w* 
      (?: t \w* ){3}
 )
 [eat]+ 
 (?! \S )

To use an ordinary word boundary, use this: 要使用普通的单词边界,请使用:

\\b(?!\\w*(?:e\\w*){2})(?!\\w*(?:a\\w*){2})(?!\\w*(?:t\\w*){3})[eat]+\\b

Formatted: 格式:

 \b                     # Word boundary
 (?!                    # Lookahead, assert Not 2 'e' s
      \w* 
      (?: e \w* ){2}
 )
 (?!                    #  Lookahead, assert Not 2 'a' s
      \w* 
      (?: a \w* ){2}
 )
 (?!                    #  Lookahead, assert Not 3 't' s
      \w* 
      (?: t \w* ){3}
 )
 # At this point all the checks pass, 
 # all thats left is to match the letters.
 # -------------------------------------------------

 [eat]+                 # 1 or more of these, Consume letters 'e' 'a' or 't'
 \b                     # Word boundary

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式匹配任何单词组合但不是单个小数 - Regex to match any combination of words but not a single decimal 除特定标记组合外,标记贝特文单词的限制数量 - Limit amount of marks beetwen words except specific marks combination 正则表达式用于错过字母的单词以及按顺序颠倒的两个字母 - Regex for words that miss a letter and also two letters that are inverted in order C# 密码的正则表达式必须包含任意顺序的 3 个数字和 4 个字母 - Regex for C# password must contains 3 numbers and 4 letters in any order 尝试多次匹配多个单词,使用正则表达式的任何顺序 - Trying to match multiple words multiple times, any order using regex 如何匹配字母和数字组合的单词,但避免只包含数字的单词 - How to match words with combination of letters and numbers but avoid words with only numbers 正则表达式找到由字母组成的单词 - Regex to find words made of set of letters 匹配由C#正则表达式中的任何字符分隔的两个单词的串联子字符串(以任何顺序) - Matching concatenated substrings (in any order) of two words separated by any character in C# regex 正则表达式特定搜索行以任何顺序存在,可能存在或不存在 - Regex Specific Search lines in any order and may or may not exist 如何搜索包含特定数量字母的单词 - how to search for the words containing a specific number of letters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM