简体   繁体   中英

What is a proper regex to find all the variations of the HTML <td> using Java?

I am trying to practice my skills by putting a formatted HTML table into a java matrix.

The problem is I am working with regexes and unfortunately they aren't working in the way I want.

For example, for the line:

<TD ALIGN="CENTER" colspan="14"><B class="useNavy">Computer Science</B><br></tr>

I am trying to "clean" the code by making TD ALIGN="CENTER" colspan="14" a plain td.

I use the following code where row contains that line:

row = row.replaceAll("<(td|TD)(.*)?>", "<td>");

I am expecting to get:

<td><B class="useNavy">Computer Science</B><br></tr>

But instead I get a single

<td>

What is wrong with my regex?

I thought I should tell the program to stop in the first match but it doesn't seem to work (replaceFirst) either.

I tried the following variations of the regex, but the same thing happens:

"<(td|TD).*>", "<(td|TD)(.*)>"

<(td|TD)[^>]*> should grab all the td elements in your document.

[^>]* is the key part. It means "get as many characters as you find that aren't the closing greater than character".

use this simple regex pattern

String p="(\\\\.td\\\\.B\\\\sclass.*)";

Hope this helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM