简体   繁体   中英

Pattern Matching for java using regex

I have a Long string that I have to parse for different keywords. For example, I have the String:

"==References== This is a reference ==Further reading== *{{cite book|editor1-last=Lukes|editor1-first=Steven|editor2-last=Carrithers|}} * ==External links=="

And my keywords are

'==References==' '==External links==' '==Further reading=='

I have tried a lot of combination of regex but i am not able to recover all the strings.

the code i have tried:

Pattern pattern = Pattern.compile("\\=+[A-Za-z]\\=+");
Matcher matcher = pattern.matcher(textBuffer.toString());

while (matcher.find()) {
    System.out.println(matcher.group(0));
}

You don't need to escape the = sign. And you should also include a whitespace inside your character class.

Apart from that, you also need a quantifier on your character class to match multiple occurrences. Try with this regex:

Pattern pattern = Pattern.compile("=+[A-Za-z ]+=+");

You can also increase the flexibility to accept any characters in between two == 's, by using .+? (You need reluctant quantifier with . to stop it from matching everything till the last == ) or [^=]+ :

Pattern pattern = Pattern.compile("=+[^=]+=+");

If the number of = 's are same on both sides, then you need to modify your regex to use capture group, and backreference:

"(=+)[^=]+\\1"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM