正则表达式 - 捕获重复数字（不是数字）

Question

I'm trying to write a method in Java that checks a String and allows it to only contain numbers and comma. 我正在尝试用Java编写一个检查String的方法，并允许它只包含数字和逗号。 Besides, there can be no repeating numbers. 此外，没有重复的数字。

For instance: 例如：

11,22,33 - this is ok 11,22,33 - 这没关系
22,22,33 - this is NOT ok 22,22,33 - 这不行

I've done a first draft of it using a combination of regex and Set<String> (below), but was looking for something better, preferably using regex only. 我已经使用正则表达式和Set<String> （下面）的组合完成了它的初稿，但是正在寻找更好的东西，最好只使用正则表达式。

public boolean isStringOk(String codes) {
    if(codes.matches("^[0-9,]+$")){ 
        Set<String> nonRepeatingCodes = new LinkedHashSet<String>();
        for(String c: codigoRoletas.split(",")){
            if(nonRepeatingCodes.contains(c)){
                return false;
            }
            else{
                nonRepeatingCodes.add(c);
            }
        }
        return true;
     }
    return false;
}

Does anyone know if this is possible using regex only? 有没有人知道这是否可以使用正则表达式？

Answer 1

I doubt that it is advisable (as Jarrod Roberson mentioned), since it is hard to understand for any fellow coder on your project. 我怀疑这是明智的（正如Jarrod Roberson所提到的那样），因为对你的项目中的任何编码人员来说很难理解。 But it is definitely possible with regex only: 但只有正则表达式才有可能：

^(?:(\d+)(?!.*,\1(?!\d)),)*\d+$

The double-negative lookahead makes it a bit difficult to understand. 理性的双阴性前瞻使其有点难以理解。 But here is an explanation: 但这是一个解释：

^                # anchor the regex to the beginning of the string
(?:              # subpattern that matches all numbers, but the last one and all commas
    (\d+)        # capturing group \1, a full number
    (?!          # negative lookahead, that asserts that this number does not occur again
        .*       # consume as much as you want, to look through the whole string
        ,        # match a comma
        \1       # match the number we have already found
        (?!\d)   # make sure that the number has ended (so we don't get false negatives)
    )            # end of lookahead
    ,            # match the comma
)*               # end of subpattern, repeat 0 or more times
\d+              # match the last number
$                # anchor the regex to the beginning of the string

Note that this up there is just the general regex, not specific to Java. 请注意，这只是一般的正则表达式，不是特定于Java。 In Java you need to escape every backslash, otherwise it won't get through to the regex engine: 在Java中，您需要转义每个反斜杠，否则它将无法进入正则表达式引擎：

^(?:(\\d+)(?!.*,\\1(?!\\d)),)*\\d+$

Answer 2

Be warned that using regular expressions for what's technically a non-regular language can be dangerous, especially for large, non-matching strings. 请注意，使用正则表达式来处理技术上非常规的语言可能会很危险，尤其是对于大型的非匹配字符串。 You can introduce exponential time complexity if you're not careful. 如果你不小心，你可以引入指数时间复杂度。 Also, regular expression engines have to do some back-door tricks that can also slow down the engine. 此外，正则表达式引擎必须做一些后门技巧，这也可以减慢引擎的速度。

If you try the other solutions and they give you problems, you can try it this way using a capture group along with the Pattern and Matcher classes to make your code cleaner: 如果您尝试其他解决方案并且它们会给您带来问题，您可以使用捕获组以及Pattern和Matcher类以这种方式尝试，以使您的代码更清晰：

private static final Pattern PATTERN = Pattern.compile("([\\d]+),?");

public static boolean isValid(String str) {
    Matcher matcher = PATTERN.matcher(str);
    Set<Integer> found = new HashSet<Integer>();
    while (matcher.find()) {
        if (!found.add(Integer.parseInt(matcher.group(1)))
            return false;
    }
    return true;
}

Answer 3

Here's the least-ugly regex I could come up with: 这是我能想到的最难看的正则表达式：

return codes.matches("^(?:,?(\\d+)(?=(?:,(?!\\1\\b)\\d+)*$))+$");

breakdown: 分解：

,? consumes the next comma if there is one (ie, it's not the beginning of the string). 如果有一个逗号，则使用下一个逗号（即，它不是字符串的开头）。
(\\d+) captures the next number in group #1 (\\d+)捕获组＃1中的下一个数字
(?=(?:,(?!\\1\\b)\\d+)*$) tries to match the remaining numbers, checking each one to make sure it's not the same as the one that was just captured. (?=(?:,(?!\\1\\b)\\d+)*$)尝试匹配剩余的数字，检查每个数字以确保它与刚刚捕获的数字不同。

The \\b after the backreference prevents false positives on strings like 11,111 . 反向引用后的\\b可以防止像11,111这样的字符串出现误报。 It isn't needed anywhere else, but you can tack one onto each \\d+ if you want, and it might make the regex slightly more efficient. 在其他任何地方都不需要它，但如果你愿意，可以在每个\\d+上添加一个，这可能会使正则表达式更有效率。 But if you need to tweak the regex for maximum performance, making all the quantifiers possessive will have more effect: 但是如果你需要调整正则表达式来获得最大性能，那么使所有量词占有率都会产生更大的影响：

"^(?:,?+(\\d++)(?=(?:,(?!\\1\\b)\\d++)*+$))++$"

Answer 4

This regex would do 这个正则表达式会这样做

^(?=^\d+(?:,\d+)*$)(?!^.*?((?<=(?:^|,))\d+(?=[,$])).*?\1(?:$|,.*?)).*?$

(?=^\\d+(?:,\\d+)*$) checks for valid format like 45 or 55,66,88,33 (?=^\\d+(?:,\\d+)*$)检查有效格式，如45或556,88,33

(?!^.*?((?<=(?:^|,))\\d+(?=[,$])).*?\\1(?:$|,.*?)) doesnt match if there are any repeating digits.. (?!^.*?((?<=(?:^|,))\\d+(?=[,$])).*?\\1(?:$|,.*?))如果不匹配有任何重复的数字..

.*? matches everything provided the above negative lookahead returns true 如果上面的负向前瞻返回true，则匹配所有内容

works here 在这里工作

正则表达式 - 捕获重复数字（不是数字）

问题描述

4 个解决方案

解决方案1
6 2012-11-12 17:59:19

解决方案2
2 2012-11-12 18:10:57

解决方案3
1 2012-11-12 19:48:28

解决方案4
0 2012-11-12 18:23:28

正则表达式 - 捕获重复数字（不是数字）

问题描述

4 个解决方案

解决方案1 6 2012-11-12 17:59:19

解决方案2 2 2012-11-12 18:10:57

解决方案3 1 2012-11-12 19:48:28

解决方案4 0 2012-11-12 18:23:28

解决方案1
6 2012-11-12 17:59:19

解决方案2
2 2012-11-12 18:10:57

解决方案3
1 2012-11-12 19:48:28

解决方案4
0 2012-11-12 18:23:28