简体   繁体   中英

Text with in the html tag providing the tag name with attribute

I have a string looks like this-

  <h3 class="media__title"> 
  <a class="media__link" href="/news/world-europe41644527" rev="video|headline">
  The equestrian champion with no legs                                                         
  </a> </h3>

And I tried to read and get the text within the h3 tags using this pattern

 String regex = <h3>(.+?)</h3>

The code I'm using

 private ArrayList<String> getValues(String resource) {
    final ArrayList<String> values= new ArrayList<>();
    final Matcher matcher = regex.matcher(str);
    while (matcher.find()) {
        values.add(matcher.group(1));
    }
    return values;
}

This code will work if I remove the class=media__title attribute from the h3 tags. I tried changing the regex to this

String regex = <h3 class=\"medial__title\">(.+?)</h3>

and still no progress. Can someone tell me what should be changed in this regex pattern?

try this:

String regex = <h3 (.*)>((.|\s)+?)<\/h3>

The main problem with your approach is that the . character does not match line terminators.

Explained:

<h3 (.*)> matches an opening h3 tag together with all attributes contained (you could also use different patterns if you are interested in the attributes themselfs)

((.|\s)+?) match everything inside the h3 tag (.|s) means everything ("everything but line terminators or whitesaces")

<\/h3> the closing h3 tag (escaped because / is a regex delimiter)

Keep in mind that now the group you're looking for is the second group, not the first

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM