简体   繁体   English

Java正则表达式正向预测但仅匹配唯一字符?

[英]Java regex positive look-ahead but match unique characters only?

I'm trying to match a String input with the criteria below:我正在尝试将字符串输入与以下条件匹配:

  1. The first characters are unique lowercase English letters第一个字符是唯一的小写英文字母
  2. The next characters are the represent the current year from 1500 to 2020接下来的字符代表从 1500 年到 2020 年的当前年份
  3. The next characters can only be 10, or 100, or 1000接下来的字符只能是 10 或 100 或 1000
  4. The last character will be a digit 0 through 9最后一个字符将是一个数字 0 到 9

The regex string that I have created that I believe is mostly correct is with explanation is:我创建的我认为大部分正确的正则表达式字符串的解释是:

String validRegex = 
"^"+                                    # start of string
(?=.*[a-z].*[a-z].*[a-z])"+             # Ensure string has only 3 consecutive lowercase English letters
"(?=.*[0-9].*[0-9].*[0-9].*[0-9])"+     # Ensure string has only 4 digits representing year i.e. 2020
"(?=.*([0-9].*[0-9]) | ([0-9].*[0-9].*[0-9]) | ([0-9].*[0-9].*[0-9].*[0-9]))"+ # Ensure 10, 100, or 100 digits
"(?=.*[0-9])"+                          # Ensure last character is a digit 0-9
"(?=\\S+$)"+                             # Ensure string has no whitespace
".{10,12}"+                              # Entire string length must be from 10 through 12 characters
"$";                                     # end of string

Is there a simple way to update my regex expression such that I can detect for only unique consecutive characters?有没有一种简单的方法来更新我的正则表达式,以便我只能检测到唯一的连续字符?

Look:看:

  • The entire input (String) length will be from 10 to 12 characters always - ^.{10,12}$ ( HOWEVER , in this case, you do not need to add this to the overall pattern because all parts below will sum up to 10, 11 or 12 chars allowed in the string)整个输入(字符串)长度将始终为 10 到 12 个字符- ^.{10,12}$但是,在这种情况下,您不需要将其添加到整体模式中,因为下面的所有部分将总计为字符串中允许 10、11 或 12 个字符)
  • The first 3 characters are UNIQUE lowercase English letters ( [az] ) - ^([az])(?!\\\\1)([az])(?!\\\\1|\\\\2)[az]前 3 个字符是唯一的小写英文字母 ( [az] ) - ^([az])(?!\\\\1)([az])(?!\\\\1|\\\\2)[az]
  • The next 4 characters are the represent the current year from 1500 to 2020, ie 2020 - (?:1[5-9][0-9]{2}|20[01][0-9]|2020)接下来的 4 个字符是代表当前年份从 1500 年到 2020 年,即 2020 - (?:1[5-9][0-9]{2}|20[01][0-9]|2020)
  • The next characters can only be 10, or 100, or 1000 only (so at minimum 2 chars (ie 10), or at max 4 chars (ie 1000)) - [0-9]{2,4}接下来的字符只能是 10、100 或 1000(因此至少 2 个字符(即 10),或最多 4 个字符(即 1000)) - [0-9]{2,4}
  • The last character will be a digit 0 through 9 - [0-9] .最后一个字符将是数字 0 到 9 - [0-9]

Joining these bits, you get加入这些位,你得到

String regex = "^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9]$";

See the regex demo .请参阅正则表达式演示

If you plan to support lower- and uppercase letter, add the case insensitive modifier (?i) at the start:如果您打算支持小写和大写字母,请在开头添加不区分大小写的修饰符(?i)

String regex = "(?i)^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9]$";

If there can be a letter at the end, not just a digit, you may use如果末尾可以有一个字母,而不仅仅是一个数字,您可以使用

String regex = "(?i)^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9a-z]$";

See this regex demo .请参阅此正则表达式演示

To create regex number ranges, you may use such well-known services as gamon.webfactional.com or richie-bendall.ml , or MyRegexTester.com .要创建正则表达式数字范围,您可以使用诸如gamon.webfactional.comrichie-bendall.mlMyRegexTester.com 之类的知名服务。

See the Java demo :请参阅Java 演示

String regex = "(?i)(([a-z])(?!\\2)([a-z])(?!\\2|\\3)[a-z])(1[5-9][0-9]{2}|20[01][0-9]|2020)([0-9]{2,4})([0-9a-z])";
String s = "AVG190420T";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
    System.out.println("Part 1: " + matcher.group(1));
    System.out.println("Part 2: " + matcher.group(4));
    System.out.println("Part 3: " + matcher.group(5));
    System.out.println("Part 4: " + matcher.group(6));
} else {
    System.out.println(s + " does not match the pattern.");
}

Output:输出:

Part 1: AVG
Part 2: 1904
Part 3: 20
Part 4: T

The following regexp does not use lookaheads but it seems to be validating better by the initial requirements:以下正则表达式不使用前瞻,但似乎通过初始要求进行了更好的验证:

^(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz)(1[5-9]\d{2}|20[0-1]\d|2020)10{1,3}\d$

Online demo在线演示

The 1 st group (abc|bcd|...|xyz) validates unique consecutive lowercase letters.1(abc|bcd|...|xyz)验证独特的连续小写字母。

The 2 nd group validates year: (1[5-9]\\d{2}|20[01]\\d|2020) match year from 1500 to 2020第二组验证年份: (1[5-9]\\d{2}|20[01]\\d|2020)匹配年份从 1500 到 2020

The remaining digital suffix is validated:剩余的数字后缀经过验证:

  • 10{1,3} match 10, 100 or 100 10{1,3}匹配 10、100 或 100
  • \\d match the closing digit \\d匹配结束数字

Update更新
For the year range 1900..2019 the pattern is (19\\d{2}|20[01]\\d) For the digits like 10, 20, 50, 100, 200, 500, 1000, the pattern is (10{1,3}|[25]0{1,2})对于年份范围 1900..2019,模式是(19\\d{2}|20[01]\\d)对于像 10, 20, 50, 100, 200, 500, 1000 这样的数字,模式是(10{1,3}|[25]0{1,2})

Updated online demo更新的在线演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM