简体   繁体   English

使用正则表达式的字符串中相似字符的唯一序列

[英]Unique sequence of similar characters in a string using regex

I have some test strings:我有一些测试字符串:

  1. "x" “X”
  2. " mm " “ 毫米 ”
  3. "x mm" “×毫米”
  4. "yy x mm" “yy x 毫米”
  5. "xx mm y mm" “xx 毫米 y 毫米”

I want to make a regex which should match strings 1,2,3,4 but not 5.我想制作一个应该匹配字符串 1,2,3,4 但不是 5 的正则表达式。

So my constraints for match are:所以我对比赛的限制是:

  1. One alphabet sequence should occur once in the string.一个字母序列应该在字符串中出现一次。 (eg "y" is a sequence one y and "yy" is a sequence of two y's but they contain same alphabet so they are contradictory and can't occur together) (例如,“y”是一个 y 的序列,“yy”是两个 y 的序列,但它们包含相同的字母表,因此它们是矛盾的,不能一起出现)
  2. Only specific alphabets are allowed in the string (for my case "xym").字符串中只允许使用特定的字母(对于我的情况是“xym”)。
  3. Any sequence can occur at start, middle or end of the string.任何序列都可以出现在字符串的开头、中间或结尾。 But it should be prefixed or suffixed with non-word character if another alphabet sequence precedes or succeeds it respectively.但如果在它之前或之后分别有另一个字母序列,则应以非单词字符作为前缀或后缀。
  4. It is not necessary that all the alphabet sequences must present in the string.没有必要所有字母序列都必须出现在字符串中。

Note:- I want only one regex to solve this problem.注意:- 我只想要一个正则表达式来解决这个问题。 Because with separate regex and iteration I have already done it.因为使用单独的正则表达式和迭代我已经完成了。 I am searching for single line solution to validate my string.我正在寻找单行解决方案来验证我的字符串。

The solution I have tried is:我尝试过的解决方案是:

/(?=^[xym\W]+$)((?=^([^m]*\W)?m+(\W[^m]*)?$)|(?=^([^x]*\W)?x+(\W[^x]*)?$)|(?=^([^y]*\W)?y+(\W[^y]*)?$))/

But it is matching 5th case also.但它也匹配第 5 种情况。

You may use您可以使用

/^(?!.*\b([xym])\1*\b.*\b\1+\b)(?:\s*\b([xym])\2*\b)*\s*$/

See the regex demo .请参阅正则表达式演示

Details细节

  • ^ - start of string ^ - 字符串的开始
  • (?!.*\\b([xym])\\1*\\b.*\\b\\1+\\b) - a negative lookahead that fails the match if immediately after the string start there is (?!.*\\b([xym])\\1*\\b.*\\b\\1+\\b) - 如果在字符串开始后立即出现匹配失败的负前瞻
    • .* - any 0+ chars other than line break chars, as many as possible .* - 除换行符以外的任何 0+ 个字符,尽可能多
    • \\b([xym])\\1*\\b - a whole word that consists of identical chars, x , y or m \\b([xym])\\1*\\b - 由相同字符组成的整个单词, xym
    • .* - any 0+ chars other than line break chars, as many as possible .* - 除换行符以外的任何 0+ 个字符,尽可能多
    • \\b\\1+\\b - a whole word that consists of a char captured in Group 1 \\b\\1+\\b - 由第 1 组中捕获的字符组成的整个单词
  • (?:\\s*\\b([xym])\\2*\\b)* - 0 or more repetitions of (?:\\s*\\b([xym])\\2*\\b)* - 0 次或多次重复
    • \\s* - 0 or more whitespace chars \\s* - 0 个或多个空白字符
    • \\b([xym])\\2*\\b - a whole word that consists of 1 or more of the same chars, x , y or m \\b([xym])\\2*\\b - 由 1 个或多个相同字符xym组成的整个单词
  • \\s* - 0 or more whitespace chars \\s* - 0 个或多个空白字符
  • $ - end of string. $ - 字符串的结尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM