简体   繁体   English

Java正则表达式,在其中查找文本

[英]Java regex, find text inside

I need to find some string in text after keyword in inside brackets first occurrence. 我需要在括号内第一次出现的关键字之后的文本中找到一些字符串。 This is the text example: 这是文本示例:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Lorem ipsum dolor坐镇,奉献己任,圣埃塞莫德临时工和劳动大臣。 === FIRST KEYWORD === veniam, {{ text need to get }} ullamco laboris nisi ut aliquip ex ea commodo consequat. ===第一个关键字=== veniam, {{文本需要获得}} ullamco labouris comalimod的后果。 Duis aute irure dolor in reprehenderit {{in voluptate velit esse cillum}} dolore eu fugiat nulla pariatur. Duis aute irure dolor in reprehenderit {{in voluptate velit esse cillum}}杜洛尔·欧·富吉亚特·皮亚图尔。 Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia {{deserunt mollit anim }}id est laborum 不正当的圣人,在罪魁祸首中{{deserunt mollit anim}} id est labour

So I need get text inside the brackets after first keyword . 所以我需要在第一个关键字之后的方括号内输入文本。

I tried many combination but the best was that I received the text from last brackets not first. 我尝试了多种组合,但最好的是,我收到的不是最后一个括号中的文字。 With this exp I got text after keywords (?<==== FIRST KEYWORD ===).(.|\\n)* But with finding first text in brackets I didn't succeed. 有了这个exp,我在关键字(?<==== FIRST KEYWORD ===).(.|\\n)*之后得到了文本(?<==== FIRST KEYWORD ===).(.|\\n)*但是在括号中找到第一个文本后,我没有成功。

UPD: Thank you all, but answer from Bohemian not work for my corpus. UPD:谢谢大家,但是波希米亚语的答案对我的语料没有用。
This answer : 这个答案:

"(?<==== FIRST KEYWORD ===)[^{]*\\{\\{([^}]*)\\}\\}"

works, but I don't see it now. 可以,但是我现在看不到。 So I cannot say thanks to that guy who wrote this, I don't remember. 因此,我不记得要感谢写过这篇文章的那个人,我不记得了。

This code extracts your target: 这段代码提取了您的目标:

String target = input.replaceAll("(?s).*?=== FIRST KEYWORD ===.*?\\{\\{(.*?)\\}\\}.*", "$1");

The important part of the regex is the use of a reluctant quantifier .*? 正则表达式的重要部分是使用希望的量词.*? , which will stop consuming input at the first available match (not skipping over it to a subsequent match). ,这将在第一个可用匹配项上停止使用输入(而不是将其跳过到后续匹配项)。

Edit: 编辑:

Note (thanks to @guido for pointing this out) that the dotall flag (?s) has been added, which allows the dot matches to run across lines - required when working with multi-line input. 请注意(感谢@guido指出这一点),已添加了dotall标志(?s) ),该标志允许点匹配跨行运行-使用多行输入时需要。


Some test code, using an abbreviated form of your example: 一些测试代码,使用示例的缩写形式:

String input = "one two === FIRST KEYWORD === three {{xxx}} four {{yyy}} five";
String target = input.replaceAll("(?s).*?=== FIRST KEYWORD ===.*?\\{\\{(.*?)\\}\\}.*", "$1");
System.out.println(target);

Output: 输出:

xxx

Option 1: If you want the {{text AND the braces}} 选项1:如果您想要{{text AND the braces}}

String ResultString = null;
try {
    Pattern regex = Pattern.compile("=== FIRST KEYWORD ===[^{]*?(\\{\\{(?:.(?!}}))*.}})", Pattern.DOTALL);
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        ResultString = regexMatcher.group(1);
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Option 2: If you want to match the {{ text but NOT the braces }} 选项2:如果要匹配{{ text but NOT the braces匹配text but NOT the braces }}

String ResultString = null;
try {
    Pattern regex = Pattern.compile("=== FIRST KEYWORD ===[^{]*?\\{\\{((?:.(?!}}))*.)}}", Pattern.DOTALL);
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        ResultString = regexMatcher.group(1);
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM