I am trying to practice my skills by putting a formatted HTML table into a java matrix.
The problem is I am working with regexes and unfortunately they aren't working in the way I want.
For example, for the line:
<TD ALIGN="CENTER" colspan="14"><B class="useNavy">Computer Science</B><br></tr>
I am trying to "clean" the code by making TD ALIGN="CENTER" colspan="14" a plain td.
I use the following code where row contains that line:
row = row.replaceAll("<(td|TD)(.*)?>", "<td>");
I am expecting to get:
<td><B class="useNavy">Computer Science</B><br></tr>
But instead I get a single
<td>
What is wrong with my regex?
I thought I should tell the program to stop in the first match but it doesn't seem to work (replaceFirst) either.
I tried the following variations of the regex, but the same thing happens:
"<(td|TD).*>", "<(td|TD)(.*)>"
<(td|TD)[^>]*>
should grab all the td elements in your document.
[^>]*
is the key part. It means "get as many characters as you find that aren't the closing greater than character".
use this simple regex pattern
String p="(\\\\.td\\\\.B\\\\sclass.*)";
Hope this helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.