简体   繁体   中英

Regex Pattern matching with HTML tag's

This is only for a small Android program I am messing with so I only need to match one or two tags

I have one HTML tag and I can get whats inside that tag which is "FC-Cologne" I use this code to get it

Pattern pattern = Pattern.compile("report\\">(.*?)</a>",Pattern.MULTILINE);

here is the HTML tag I can get to work

<a href="/match-menu/3405570/first-team/fc-cologne=report"> FC Cologne</a>

But I can't get this tag, I don't know is it because of the space after the word "opposition" or/and the quotes inside the HTML tag, because they are not in the first tag

This is the one I can't get to work

<td class="bold opposition "> "Olympiacos" </td>

This is the code I am trying

Pattern pattern = Pattern.compile("opposition \">(.*?)</td>",Pattern.MULTILINE);

I have tried replacing the spaces " " with "" an empty string and I have tried \\s where the space is but I get nothing.

I would appreciate if anyone could help me.

Unless you have a typo in one of the two - < /td> has a space after the < and in your regex </td> doesn't.

Adding a space to the regex after the < caused the match to succeed in RegexBuddy

Update: Seems the space is not in the tag the OP is working with.

In RegexBuddy I have the pattern (copied as a Java String)

"opposition \">(.*?)</td>"

which matches the html

< td class="bold opposition "> "Olympiacos"       </td>

giving a match of

opposition "> "Olympiacos"       </td>

and Group 1 of

 "Olympiacos"       <--Line ends there.

This is what you're looking for I believe.

<(\\w+)\\s*(?:\\w+(?:=(?:'(?:[^']|(?<=\\\\)')*'|"(?:[^"]|(?<=\\\\)")*"))?\\s*)*>(.*?)</\\1\\s*>

You will want to use the second group to get the contents of the tag (the first group is the tag name). Note that this does not work recursively. Nested elements are captured in the second group so you will need to use this regex on the second group of its match until there are no matches if that makes sense.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM