简体   繁体   English

Java正则表达式,点匹配

[英]Java regular expressions, dot match

I have a string in the form of: 我有以下形式的字符串:

{something here}{something here}{something here}

etc In summary there are brackets with text or anything inside (any character). 等等总之,括号内带有文本或任何内容(任何字符)。 I want to split it into an array (using Java's String.split(regex) function). 我想将其拆分为一个数组(使用Java的String.split(regex)函数)。 The regex I used is \\\\{.*\\\\} but it's not working. 我使用的正则表达式为\\\\{.*\\\\}但无法正常工作。

Any ideas? 有任何想法吗?

public static void main(String[] args) {
    String input="{something here}{something here}{something here}";
    String[] parts=input.substring(1,input.length()-1).split("\\}\\{");
    for(String s:parts)
        System.out.println(s);
}

output: 输出:

something here
something here
something here

In regex, the * is greedy, meaning that it will consume as many characters as possible. 在正则表达式中,*是贪婪的,表示它将消耗尽可能多的字符。 This means that the regex: \\{.*\\} will match the entire string, as the .* will match "something here}{something here}{something here". 这意味着正则表达式: \\{.*\\}将匹配整个字符串,因为。*将匹配“此处某物} {此处某物} {此处某物”。 putting a ? 放一个? after a * will make it behave in an un-greedy fashion, meaning that it will only consume characters up until the next expression can match. *后面的字符将使其表现为不贪心的行为,这意味着它将仅消耗字符,直到下一个表达式可以匹配为止。 therefore, try \\{.*?\\} as your regex instead 因此,请改用\\{.*?\\}作为您的正则表达式

The argument in the split() method specifies the separator between the parts, not the parts themselves. split()方法中的参数指定零件之间的分隔符 ,而不是零件本身。

As suggested by Sotirios Delimanolis, you can achieve your goal by repeatedly matching a pattern. 如Sotirios Delimanolis所建议,您可以通过重复匹配模式来实现您的目标。 The example code below gets the the text inside the braces. 下面的示例代码获取大括号内的文本。

    String val = "{alpha}{beta}{delta\nepsilon}";
    Pattern pattern = Pattern.compile( "\\{(.*?)\\}", Pattern.DOTALL);
    Matcher matcher = pattern.matcher( val );
    while ( matcher.find() ) {
        String part = matcher.group(1);
        System.out.print( String.format("%s,", part));
    }

The .*? .*? expression provides a reluctant match, which prefers to match as few characters from the string as possible. 表达式提供了一个勉强的匹配,它更愿意匹配字符串中尽可能少的字符。 If you just use .* , that's a greedy match - the first match will be the entire string. 如果只使用.* ,那是一个贪婪的匹配-第一个匹配将是整个字符串。

You mentioned in a comment on a deleted answer that your "something here" strings can contain new lines. 您在对已删除答案的评论中提到“此处的内容”字符串可以包含新行。 For the '.' 为了 '。' to match newlines, you need to use the Pattern.DOTALL flag, as shown above. 为了匹配换行符,您需要使用Pattern.DOTALL标志,如上所示。

The call to matcher.group(1) gives you the text matching the capturing group (.*?) . 调用matcher.group(1)将为您提供与捕获组 (.*?)相匹配的文本。 If you wanted the braces included, you could omit the parantheses and simply call matcher.group(0) to get the entire match. 如果希望包含括号,则可以省略括号,而只需调用matcher.group(0)即可获得整个比赛。

You know the quote about regexps ... Well, it doesn't help. 您知道有关正则表达式的报价...嗯,这无济于事。

What does help, however, are (like usual) tests. 但是,有帮助的是(像往常一样)测试。 And regular expressions can be tested online using websites such as RegExp Planet 可以使用RegExp Planet等网站在线测试正则表达式

So, when using as test string {a}{long}{text with spaces} I can find as a "good" regexp the following \\{([^\\}]*)\\} . 因此,当用作测试字符串{a}{long}{text with spaces}我可以找到以下\\{([^\\}]*)\\}作为“好的”正则表达式。 And, to quote the source site 并且,引用源站点

Regular Expression {([^}] )} 正则表达式{([^^]] }}
as a Java string "\\{([^\\}] )\\}" 作为Java字符串“ \\ {([[^ \\}] )\\}”

Cause one shouldn't forget to add the double backslashes 因为一个人不应该忘记添加双反斜杠

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM