简体   繁体   English

使用正则表达式从字符串中的提取字符串

[英]Extract String from a within a String using a Regular Expression


I have a very large String containing within it some markers like: 我有一个很大的String,其中包含一些标记,例如:

{codecitation class="brush: java; gutter: true;" width="700px"}

I'd need to collect all the markers contained in the long String. 我需要收集长字符串中包含的所有标记。 The difficulty I find in this task is that the markers all contain different parameter values. 我在此任务中发现的困难是所有标记都包含不同的参数值。 The only thing they have in common is the initial part that is: 他们唯一的共同点是初始部分:

{codecitation class="brush: [VARIABLE PART] }

Do you have any suggestion to collect all the markers in Java using a Regular Expression ? 您是否有使用正则表达式收集Java中所有标记的建议?

Use pattern matching to find the markers as below. 使用模式匹配找到标记,如下所示。 I hope this will help. 我希望这将有所帮助。

String xmlString = "{codecitation class=\"brush: java; gutter: true;\" width=\"700px\"}efasf{codecitation class=\"brush: java; gutter: true;\" width=\"700px\"}";
Pattern pattern = Pattern.compile("(\\{codecitation)([0-9 a-z A-Z \":;=]{0,})(\\})");
Matcher matcher = pattern.matcher(xmlString);

while (matcher.find()) {
    System.out.println(matcher.group());
}

I guess you are particularly interested in the brush: java; 我想您对画笔特别感兴趣:java; and gutter: true; 装订线:true; parts. 部分。

Maybe this snippet helps: 也许此片段有助于:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CodecitationParserTest {

    public static void main(String[] args) {
        String testString = "{codecitation class=\"brush: java; gutter: true;\" width=\"700px\"}";
        Pattern codecitationPattern = Pattern
                .compile("\\{codecitation class=[\"]([^\"]*)[\"][^}]*\\}");
        Matcher matcher = codecitationPattern.matcher(testString);

        Pattern attributePattern = Pattern
                .compile("\\s*([^:]*): ([^;]*);(.*)$");
        Matcher attributeMatcher;
        while (matcher.find()) {
            System.out.println(matcher.group(1));
            attributeMatcher = attributePattern.matcher(matcher.group(1));
            while (attributeMatcher.find()) {
                System.out.println(attributeMatcher.group(1) + "->"
                        + attributeMatcher.group(2));
                attributeMatcher = attributePattern.matcher(attributeMatcher
                        .group(3));
            }
        }
    }

}

The codecitationPattern extracts the content of the class attribute of a codecitation element. codecitationPattern提取一个codecitation元素的class属性的内容。 The attributePattern extracts the first key and value and the rest, so you can apply it recursively. attributePattern提取第一个键和值以及其余键,因此您可以递归应用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM