简体   繁体   English

正则表达式:从字符串中提取字符串

[英]Regex: extract String from String

I need a regex that makes it possible to extract a part out of String. 我需要一个正则表达式,以便可以从String中提取一部分。 I get this String by parsing a XML-Document with DOM. 我通过使用DOM解析XML文档来获得此字符串。 Then I am looking for the "§regex" part in this String and now I try do extract the value of it. 然后,我正在此字符串中寻找“§regex”部分,现在我尝试提取其值。 eg "([A-ZÄÖÜ]{1,3}[- ][AZ]{1,2}[1-9][0-9]{0,3})" from the rest. 例如,其余部分为“(([A-ZÄÖÜ] {1,3} [-] [AZ] {1,2} [1-9] [0-9] {0,3})”。

The Problem is, I don´t know how to make sure the extracted part ends with a ")" This regex needs to work for every value given. 问题是,我不知道如何确保提取的部分以“)”结尾。此正则表达式需要为给定的每个值工作。 The goal is to write only the Value in brackets after the "§regex=" including the brackets into a String. 目的是仅将“§regex=“之后的括号中的值(包括方括号)写入字符串。

<UML:TaggedValue tag="description" value=" random Text §regex=([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3}) random text"/>

private List<String> findRegex() {
    List<String> forReturn = new ArrayList<String>();
    for (String str : attDescription) {
        if (str.contains("§regex=")) {
            String s = str.replaceAll(regex);
            forReturn.add(s);
        }
    }
    return forReturn;
}

attDescription is a list which contains all Attributes found in the XML-Document parsed. attDescription是一个列表,其中包含在已解析的XML文档中找到的所有属性。

So far i tried this regex: ".*(§regex=)(.*)[)$].*", "$2" but this cuts off the ")" and does not delete the text infront of the searched part. 到目前为止,我已经尝试过此正则表达式: ".*(§regex=)(.*)[)$].*", "$2"但这切断了“)”的作用,并且并未删除搜索部分前面的文本。 Even with the help of this http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I really don´t understand how to get what I need. 即使有了这个http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html的帮助,我还是真的不明白如何获得所需的东西。

It seems to work for me (with this example anyway) if I use this in place of String s = str.replaceAll(regex); 如果我用它代替String s = str.replaceAll(regex);这似乎对我来说仍然有效(无论如何使用此示例String s = str.replaceAll(regex);

String s = str.replaceAll( ".*§regex=(\\(.*\\)).*", "$1" );

It's just looking for a substring enclosed by parentheses following §regex= . 它只是在§regex=之后寻找用括号括起来的子字符串。

This seems to work: 这似乎可行:

String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");

Note: 注意:

  • Escape the leading bracket 逃避领先的支架
  • The $ inside a character class is a literal $ - ignore it, because your regex should always end with a bracket 字符类中的$是字面的$-忽略它,因为您的正则表达式应始终以方括号结尾
  • No need to capture the fixed text 无需捕获固定文本

Test code, noting that this works with brackets in/around the regex: 测试代码,请注意,此代码可与正则表达式中/中的括号一起使用:

String str = "random Text §regex=(([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})) random text";
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
System.out.println(s);

Output: 输出:

([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM