正则表达式 - 捕获重复数字（不是数字）

Question

我正在尝试用Java编写一个检查String的方法，并允许它只包含数字和逗号。 此外，没有重复的数字。

例如：

11,22,33 - 这没关系
22,22,33 - 这不行

我已经使用正则表达式和Set<String> （下面）的组合完成了它的初稿，但是正在寻找更好的东西，最好只使用正则表达式。

public boolean isStringOk(String codes) {
    if(codes.matches("^[0-9,]+$")){ 
        Set<String> nonRepeatingCodes = new LinkedHashSet<String>();
        for(String c: codigoRoletas.split(",")){
            if(nonRepeatingCodes.contains(c)){
                return false;
            }
            else{
                nonRepeatingCodes.add(c);
            }
        }
        return true;
     }
    return false;
}

有没有人知道这是否可以使用正则表达式？

Answer 1

我怀疑这是明智的（正如Jarrod Roberson所提到的那样），因为对你的项目中的任何编码人员来说很难理解。 但只有正则表达式才有可能：

^(?:(\d+)(?!.*,\1(?!\d)),)*\d+$

理性的双阴性前瞻使其有点难以理解。 但这是一个解释：

^                # anchor the regex to the beginning of the string
(?:              # subpattern that matches all numbers, but the last one and all commas
    (\d+)        # capturing group \1, a full number
    (?!          # negative lookahead, that asserts that this number does not occur again
        .*       # consume as much as you want, to look through the whole string
        ,        # match a comma
        \1       # match the number we have already found
        (?!\d)   # make sure that the number has ended (so we don't get false negatives)
    )            # end of lookahead
    ,            # match the comma
)*               # end of subpattern, repeat 0 or more times
\d+              # match the last number
$                # anchor the regex to the beginning of the string

请注意，这只是一般的正则表达式，不是特定于Java。 在Java中，您需要转义每个反斜杠，否则它将无法进入正则表达式引擎：

^(?:(\\d+)(?!.*,\\1(?!\\d)),)*\\d+$

Answer 2

请注意，使用正则表达式来处理技术上非常规的语言可能会很危险，尤其是对于大型的非匹配字符串。 如果你不小心，你可以引入指数时间复杂度。 此外，正则表达式引擎必须做一些后门技巧，这也可以减慢引擎的速度。

如果您尝试其他解决方案并且它们会给您带来问题，您可以使用捕获组以及Pattern和Matcher类以这种方式尝试，以使您的代码更清晰：

private static final Pattern PATTERN = Pattern.compile("([\\d]+),?");

public static boolean isValid(String str) {
    Matcher matcher = PATTERN.matcher(str);
    Set<Integer> found = new HashSet<Integer>();
    while (matcher.find()) {
        if (!found.add(Integer.parseInt(matcher.group(1)))
            return false;
    }
    return true;
}

Answer 3

这是我能想到的最难看的正则表达式：

return codes.matches("^(?:,?(\\d+)(?=(?:,(?!\\1\\b)\\d+)*$))+$");

分解：

,? 如果有一个逗号，则使用下一个逗号（即，它不是字符串的开头）。
(\\d+)捕获组＃1中的下一个数字
(?=(?:,(?!\\1\\b)\\d+)*$)尝试匹配剩余的数字，检查每个数字以确保它与刚刚捕获的数字不同。

反向引用后的\\b可以防止像11,111这样的字符串出现误报。 在其他任何地方都不需要它，但如果你愿意，可以在每个\\d+上添加一个，这可能会使正则表达式更有效率。 但是如果你需要调整正则表达式来获得最大性能，那么使所有量词占有率都会产生更大的影响：

"^(?:,?+(\\d++)(?=(?:,(?!\\1\\b)\\d++)*+$))++$"

Answer 4

这个正则表达式会这样做

^(?=^\d+(?:,\d+)*$)(?!^.*?((?<=(?:^|,))\d+(?=[,$])).*?\1(?:$|,.*?)).*?$

(?=^\\d+(?:,\\d+)*$)检查有效格式，如45或556,88,33

(?!^.*?((?<=(?:^|,))\\d+(?=[,$])).*?\\1(?:$|,.*?))如果不匹配有任何重复的数字..

.*? 如果上面的负向前瞻返回true，则匹配所有内容

在这里工作

正则表达式 - 捕获重复数字（不是数字）

问题描述

4 个解决方案

解决方案1
6 2012-11-12 17:59:19

解决方案2
2 2012-11-12 18:10:57

解决方案3
1 2012-11-12 19:48:28

解决方案4
0 2012-11-12 18:23:28

正则表达式 - 捕获重复数字（不是数字）

问题描述

4 个解决方案

解决方案1 6 2012-11-12 17:59:19

解决方案2 2 2012-11-12 18:10:57

解决方案3 1 2012-11-12 19:48:28

解决方案4 0 2012-11-12 18:23:28

解决方案1
6 2012-11-12 17:59:19

解决方案2
2 2012-11-12 18:10:57

解决方案3
1 2012-11-12 19:48:28

解决方案4
0 2012-11-12 18:23:28