正则表达式查找字符返回 5 次

Question

I'm trying to build a finite state machine and I want to check the sequence that I get, with a regular expression.我正在尝试构建一个有限状态机，我想用正则表达式检查我得到的序列。 I need to check if the sequence is from the the following form:我需要检查序列是否来自以下形式：

For example:例如：

"A,B,C,C,C,C,C,A" -> is accepted. "A,B,C,C,C,C,C,A" -> 被接受。

"A,B,C,C,C,C,A" -> is ignored. "A,B,C,C,C,C,A" -> 被忽略。

"A,B,C,C,C,C,C,C,A" -> is ignored. "A,B,C,C,C,C,C,C,A" -> 被忽略。

I found this post and that post , but everything I tried simply doesn't work.我找到了这个帖子和那个帖子，但是我尝试的所有方法都不起作用。

I tried the next things: A\\B\\D{5}\\A , ABD{5}A and a couple more, but again with no success.我尝试了接下来的事情： A\\B\\D{5}\\A 、 ABD{5}A和更多，但同样没有成功。

EDIT: I want to know if the C character is return exactly 5 times, before and after doesn't matter at all, meaning it could be like this also:编辑：我想知道 C 字符是否正好返回 5 次，前后都无关紧要，这意味着它也可能是这样的：

A,A,A,F,F,R,E,D,C,C,C,C,C, ...... A、A、A、F、F、R、E、D、C、C、C、C、C、......

Don't consider the commas.不要考虑逗号。

The problem is that I need to find if a sequence is accepted but, the sequence is from the next form: A,B, C*10, I created the machine class, the state class and the event class.问题是我需要找到一个序列是否被接受，但是，该序列来自下一个形式：A、B、C*10，我创建了机器类、状态类和事件类。 But now I need to know if I have exactly 5 returns of C, and it causing me a lot of problems.但是现在我需要知道我是否正好有 5 个 C 返回，这给我带来了很多问题。

EDIT: It's not working, see the code Iv'e added.编辑：它不起作用，请参阅我添加的代码。

String sequence1 = "A,B,C,C,C,C,A";
String sequence2 = "A,B,C,C,C,C,C,A";
String sequence3 = "A,B,C,C,C,C,C,C,A";
Pattern mPattern = Pattern.compile("(\\w)(?:,\\1){4}");
Matcher m = mPattern.matcher(sequance1);
m.matches(); //FALSE
Matcher m = mPattern.matcher(sequance2);
m.matches(); //FALSE
Matcher m = mPattern.matcher(sequance3);
m.matches(); //FALSE

It's returning always false.它返回总是假的。

How can I achieve this?我怎样才能做到这一点？

Thanks.谢谢。

Answer 1

Your regex is not working because you are not considering the comma in your string, which I assume is available.您的正则表达式不起作用，因为您没有考虑字符串中的逗号，我认为它是可用的。

You can try the following regex (I'm posting here a generalized pattern, you can modify it accordingly): -您可以尝试以下正则表达式（我在这里发布了一个通用模式，您可以相应地修改它）：-

"(\\w)(?:,\\1){4}"

This will match any 5 sequence of same characters separated by comma.这将匹配由逗号分隔的任何 5 个相同字符序列。

\\1 is used to backreference the 1st matched character, and the rest of the 4 characters should be the same as that. \\1用于反向引用第一个匹配的字符，其余 4 个字符应与此相同。

Explanation: -解释： -

"(         // 1st capture group
   \\w     // Start with a character
 )
 (?:       // Non-capturing group
    ,      // Match `,` after `C`
    \\1    // Backreference to 1st capture group. 
           // Match the same character as in (\\w)
 ){4}"     // Group close. Match 4 times 
           // As 1st one we have already matched in (\\w)

UPDATE: -更新： -

If you just want to match 5 length sequence, you can add a negation of the matched character after the 5th match: -如果你只想匹配5 length序列，你可以在第 5 个匹配之后添加一个匹配字符的否定：-

"(\\w)(?:,\\1){4}(?!,\\1)"

(?!,\\\\1) -> Is negative look-ahead assertion. (?!,\\\\1) -> 是否定的前瞻断言。 It will match 5 consecutive character that are not followed by the same character.它将匹配后面没有同一个字符的 5 个连续字符。

UPDATE: -更新： -

In the above Regex, we also need to do a negative look-behind for \\\\1 which we can't do.在上面的正则表达式中，我们还需要对\\\\1进行负向后视，这是我们无法做到的。 So, I came up with this wierd looking Regex.所以，我想出了这个看起来很奇怪的正则表达式。 Which I myself don't like, but you can try it whether it works or not: -我自己不喜欢它，但不管它是否有效，您都可以尝试：-

Not Tested : -未测试：-

"(\\w),(^\\1)(?:,\\2){4}(?!,\\2)"

Explanation: -解释： -

(       // First Capture Group
  \\w   // Any character, before your required sequence. (e.g. `A` in `A,C,C,C,C,C`)
)       // Group end
,       // comma after `A`

(          // Captured group 2
   ^\\1    // Character other than the one in the first captured group. 
           // Since, We now want sequence of `C` after `A`
)
(?:        // non-capturing group
   ,       // Match comma
   \\2     // match the 2nd capture group character. Which is different from `A`, 
           // and same as the one in group 2, may be `C`

){4}       // Match 4 times

(?!        // Negative look-ahead
    ,
    \\2    // for the 2nd captured group, `C`
)

I don't know whether that explanation makes the most sense or not.我不知道这种解释是否最有意义。 But you can try it.但是你可以试试。 If it works, and you can't understand, then I'll try to explain a little better.如果它有效，而你无法理解，那么我会尝试解释得更好一点。

Answer 2

I don't understand what you have tried, but you don't need to escape letters to match them.我不明白你试过什么，但你不需要转义字母来匹配它们。

I am not sure what your requirements are, but to find 5 repeated characters you can use this:我不确定您的要求是什么，但是要找到 5 个重复的字符，您可以使用它：

(\\p{L})(?:,\\1){4}

This would find all letters that are repeated 5 times.这将找到所有重复 5 次的字母。 See it here on Regexr .在 Regexr 上查看这里。

On Regexr I used \\w because \\p{L} is not supported there, but it is in Java.在 Regexr 上，我使用了\\w因为那里不支持\\p{L} ，但它是在 Java 中使用的。

\\p{L} is a Unicode property matching every letter in any language. \\p{L}是一个 Unicode 属性，匹配任何语言中的每个字母。

The idea here is to match a letter.这里的想法是匹配一个字母。 This is done by \\\\p{L} .这是由\\\\p{L} 。
This letter is stored in a backreference because there are the brackets around (\\\\p{L}) .该字母存储在反向引用中，因为(\\\\p{L})周围有括号。
Then there is the non-capturing group (?:,\\\\1) .然后是非捕获组(?:,\\\\1) 。 This matches a comma and the \\\\1 is a reference to the letter captured before.这匹配一个逗号，而\\\\1是对之前捕获的字母的引用。
This non-capturing group is repeated 4 times (?:,\\\\1){4} .这个非捕获组重复 4 次(?:,\\\\1){4} 。

==> as result this pattern matches on 5 identical letters with commas between. ==> 结果这个模式匹配 5 个相同的字母，中间有逗号。

The problem here is, this expression will match at least 5 identical letters.这里的问题是，这个表达式将匹配至少 5 个相同的字母。 If there are more of them it will also (partly) match.如果有更多，它也将（部分）匹配。

Update:更新：

I don't see a chance to get the result directly from a regex.我看不到直接从正则表达式获得结果的机会。 But here is a method to get the length indirectly:但这是一种间接获取长度的方法：

String[] TestInput = { "A,B,C,C,C,C,C", "A,B,C,C,C,C,C,D,E",
        "C,C,C,C,C", "C,C,C,C,C,D,E", "A,B,C,C,C,C", "C,C,C,C",
        "A,B,C,C,C,C,C,C,D,E", "C,C,C,C,C,C,D,E", "C,C,C,C,C,C" };

// Match at least 5 letters in a row
// The letter is in group 2
// The complete found sequence is in group 1
Pattern p = Pattern.compile("((\\p{L})(?:,\\2){4,})");

for (String t : TestInput) {
    Matcher m = p.matcher(t);
        if (m.find()) {

            // Get the length of the found sequence, after the commas has
            // been removed
            int letterLength = m.group(1).toString().replace(",", "")
                    .length();
            // Check your condition of exactly 5 equal letters
            if (letterLength == 5) {
                System.out.println(t + " ==> " + true);
            } else {
                System.out.println(t + " ==> " + false);
            }
        }else {
            System.out.println(t + " ==> " + false);
        }
}

Output:输出：

A,B,C,C,C,C,C ==> true A,B,C,C,C,C,C ==> 真
A,B,C,C,C,C,C,D,E ==> true A、B、C、C、C、C、C、D、E ==> 真
C,C,C,C,C ==> true C,C,C,C,C ==> 真
C,C,C,C,C,D,E ==> true C、C、C、C、C、D、E ==> 真
A,B,C,C,C,C ==> false A,B,C,C,C,C ==> 假
C,C,C,C ==> false C,C,C,C ==> 假
A,B,C,C,C,C,C,C,D,E ==> false A、B、C、C、C、C、C、C、D、E ==> 假
C,C,C,C,C,C,D,E ==> false C、C、C、C、C、C、D、E ==> 假
C,C,C,C,C,C ==> false C,C,C,C,C,C ==> 假

正则表达式查找字符返回 5 次

问题描述

2 个解决方案

解决方案1
4 已采纳 2012-11-09 12:20:43

解决方案2
2 2012-11-09 12:20:42

正则表达式查找字符返回 5 次

问题描述

2 个解决方案

解决方案1 4 已采纳 2012-11-09 12:20:43

解决方案2 2 2012-11-09 12:20:42

解决方案1
4 已采纳 2012-11-09 12:20:43

解决方案2
2 2012-11-09 12:20:42