简体   繁体   English

我无法在Java中获得第一组正则表达式模式

[英]I can't get the first group of regex pattern in java

I'm trying to get the first group of a regex pattern. 我正在尝试获得正则表达式模式的第一组。 I got this string from a lyric text: 我从歌词中得到了这个字符串:

[01:34][01:36]Blablablahh nanana

I'm this regex pattern to extract [01:34],[03:36] and the text. 我是这种正则表达式模式,用于提取[01:34],[03:36]和文本。

Pattern timeLine = Pattern.compile("(\\[\\d\\d:\\d\\d\\])+(.*)");

But when I try to extract the first group [01:34] using group(1) it returns [03:36] 但是,当我尝试使用group(1)提取第一组[01:34]时,它将返回[03:36]

is there something wrong in the regex pattern? 正则表达式模式有问题吗?

Your problem is here 你的问题在这里

Pattern.compile("(\\[\\d\\d:\\d\\d\\])+(.*)");
                                      ^

This part of your pattern (\\\\[\\\\d\\\\d:\\\\d\\\\d\\\\])+ will match [01:34][01:36] because of + (which is greedy), but your group 1 can contain only one of [dd:dd] so it will store the last match found. 模式的这一部分(\\\\[\\\\d\\\\d:\\\\d\\\\d\\\\])+会因[ + [01:34][01:36]而匹配[01:34][01:36] (这是贪婪的),但是您的组1只能包含[dd:dd]一个,因此它将存储找到的最后一个匹配项。

If you want to find only [01:34] you can correct your pattern by removing + . 如果只想查找[01:34] ,则可以通过删除+来更正模式。 But you can also create simpler pattern 但是您也可以创建更简单的模式

Pattern.compile("^\\[\\d\\d:\\d\\d\\]");

and use it with group(0) which is also called by group() . 并将其与group(0)一起使用, group(0)也称为group()

Pattern timeLine = Pattern.compile("^\\[\\d\\d:\\d\\d\\]");
Matcher m = timeLine.matcher("[01:34][01:36]Blablablahh nanana");
while (m.find()) {
    System.out.println(m.group()); // prints [01:34]
}

In case you want to extract both [01:34][01:36] you can just add another parenthesis to your current regex like 如果您想同时提取[01:34][01:36] ,则可以在当前正则表达式中添加另一个括号,例如

Pattern.compile("((\\[\\d\\d:\\d\\d\\])+)(.*)");

This way entire match of (\\\\[\\\\d\\\\d:\\\\d\\\\d\\\\])+ will be in group 1. 这样, (\\\\[\\\\d\\\\d:\\\\d\\\\d\\\\])+全部匹配项将在组1中。

You can also achieve it by removing (.*) from your original pattern and reading group 0. 您也可以通过从原始模式中删除(.*)并读取组0来实现。

I thin you are confused by the repeating match (\\\\[\\\\d\\\\d:\\\\d\\\\d\\\\])+ which returns just the last match as the group value. 我认为您对重复匹配(\\\\[\\\\d\\\\d:\\\\d\\\\d\\\\])+感到困惑,后者仅返回最后一个匹配作为组值。 Try the following and see if it makes more sense to you: 请尝试以下操作,看看是否对您更有意义:

    String s = "[01:34][01:36]Blablablahh nanana";
    Pattern timeLine = Pattern.compile("(\\[\\d\\d:\\d\\d\\])(\\[\\d\\d:\\d\\d\\])(.+)");
    Matcher m = timeLine.matcher(s);
    if (m.matches()) {
        for (int i = 1; i <= m.groupCount(); i++) {
            System.out.printf("    Group %d -> %s\n", i, m.group(i)); // prints [01:36]
        }
    }    

which for me returns: 对我来说返回:

Group 1 -> [01:34]
Group 2 -> [01:36]
Group 3 -> Blablablahh nanana

I would simply grab the first part using a character class: 我只是使用字符类来抓取第一部分:

String timings = str.replaceAll("([\\[\\]\\d:]+).*", "$1");

And similarly the text: 和类似的文本:

String text = str.replaceAll("[\\[\\]\\d:]+", "");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM