I have a string looks like this-
<h3 class="media__title">
<a class="media__link" href="/news/world-europe41644527" rev="video|headline">
The equestrian champion with no legs
</a> </h3>
And I tried to read and get the text within the h3 tags using this pattern
String regex = <h3>(.+?)</h3>
The code I'm using
private ArrayList<String> getValues(String resource) {
final ArrayList<String> values= new ArrayList<>();
final Matcher matcher = regex.matcher(str);
while (matcher.find()) {
values.add(matcher.group(1));
}
return values;
}
This code will work if I remove the class=media__title
attribute from the h3 tags. I tried changing the regex to this
String regex = <h3 class=\"medial__title\">(.+?)</h3>
and still no progress. Can someone tell me what should be changed in this regex pattern?
try this:
String regex = <h3 (.*)>((.|\s)+?)<\/h3>
The main problem with your approach is that the . character does not match line terminators.
Explained:
<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)
((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")
<\/h3> the closing h3 tag (escaped because / is a regex delimiter)
Keep in mind that now the group you're looking for is the second group, not the first
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.