简体   繁体   English

使用正则表达式的Java模式匹配

[英]Pattern Matching for java using regex

I have a Long string that I have to parse for different keywords. 我有一个Long字符串,我必须解析不同的关键字。 For example, I have the String: 例如,我有字符串:

"==References== This is a reference ==Further reading== *{{cite book|editor1-last=Lukes|editor1-first=Steven|editor2-last=Carrithers|}} * ==External links=="

And my keywords are 我的关键字是

'==References==' '==External links==' '==Further reading=='

I have tried a lot of combination of regex but i am not able to recover all the strings. 我已经尝试了很多正则表达式的组合,但我无法恢复所有的字符串。

the code i have tried: 我试过的代码:

Pattern pattern = Pattern.compile("\\=+[A-Za-z]\\=+");
Matcher matcher = pattern.matcher(textBuffer.toString());

while (matcher.find()) {
    System.out.println(matcher.group(0));
}

You don't need to escape the = sign. 你不需要逃避=符号。 And you should also include a whitespace inside your character class. 而且你还应该在角色类中包含一个空格。

Apart from that, you also need a quantifier on your character class to match multiple occurrences. 除此之外,您还需要在角色类上使用量词来匹配多次出现。 Try with this regex: 试试这个正则表达式:

Pattern pattern = Pattern.compile("=+[A-Za-z ]+=+");

You can also increase the flexibility to accept any characters in between two == 's, by using .+? 您还可以通过使用.+?来增加接受两个==之间的任何字符的灵活性.+? (You need reluctant quantifier with . to stop it from matching everything till the last == ) or [^=]+ : (您需要有不愿量词.从一切,直到最后匹配停止== )或[^=]+

Pattern pattern = Pattern.compile("=+[^=]+=+");

If the number of = 's are same on both sides, then you need to modify your regex to use capture group, and backreference: 如果双方的=的数量相同,那么您需要修改正则表达式以使用捕获组和反向引用:

"(=+)[^=]+\\1"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM