From Jenkins I'm using Confluence API for getting the content of a page in HTML such like this:
<tr>
<td>bla1a</td>
<td>bla2a</td>
<td>bla3a</td>
</tr>
<tr>
<td>bla1b</td>
<td>what I’m searching</td>
<td>bla3b</td>
</tr>
<tr>
<td>bla1c</td>
<td>bla2c</td>
<td>bla3c</td>
</tr>
What I want is to Update the content of a particular line of a table where I just know the value of a string, in this case “what I'm searching”, so what I need is a regex that match everything inside a table row and the searched string:
<tr> … what I’m searching …</tr>
and returns the entire row as follow:
<tr>
<td>bla1b</td>
<td>what I’m searching</td>
<td>bla3b</td>
</tr>
Don't use regex to extract data and manipulating HTML. Mandatory links You can't parse [X]HTML with regex and why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation . Use a proper parser instead. For example Jsoup . Jsoup provides a very convenient API for extracting and manipulating HTML data and is intuitive to work with. Selector syntax selector-syntax or here Selector . Using Jsoup your code could look like:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Example {
public static void main(String[] args) throws IOException {
String html =
"<html>\n"
+ "<head></head>"
+ "<body>"
+ " <table>"
+ " <tr>\n"
+ " <td>bla1a</td>\n"
+ " <td>bla2a</td>\n"
+ " <td>bla3a</td>\n"
+ " </tr>\n"
+ " <tr>\n"
+ " <td>bla1b</td>\n"
+ " <td>what I’m searching</td>\n"
+ " <td>bla3b</td>\n"
+ " </tr>\n"
+ " <tr>\n"
+ " <td>bla1c</td>\n"
+ " <td>bla2c</td>\n"
+ " <td>bla3c</td>\n"
+ " </tr>"
+ " </table>"
+ "</body>\n"
+ "</html>";
Document doc = Jsoup.parse(html);
Element result = doc.selectFirst("tr:contains(what I’m searching)");
System.out.println(result);
}
}
output:
<tr>
<td>bla1b</td>
<td>what I’m searching</td>
<td>bla3b</td>
</tr>
You can also easily manipulate your html:
Element td = result.selectFirst("td:contains(what I’m searching)");
td.text("My updated data");
System.out.println(result);
output
<tr>
<td>bla1b</td>
<td>My updated data</td>
<td>bla3b</td>
</tr>
Maven repo:
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.15.2</version>
</dependency>
For rather simple look-up like yours, you don't really have to use any external tools, a simple regex would perfectly do.
Also, it's going to be more performant and less resource-hungry.
I'd put it like so:
String txt = '''\
<tr>
<td>bla1a</td>
<td>bla2a</td>
<td>bla3a</td>
</tr>
<tr>
<td>bla1b</td>
<td>what I’m searching</td>
<td>bla3b</td>
</tr>
<tr>
<td>bla1c</td>
<td>bla2c</td>
<td>bla3c</td>
</tr>'''
List res = ( txt =~ /(?s)<tr>(\s*<td>[\w\s]+<\/td>\s*)*<td>what I’m searching<\/td>(\s*<td>[\w\s]+<\/td>\s*)*<\/tr>/ ).findAll()*.first()
assert res == ['''<tr>
<td>bla1b</td>
<td>what I’m searching</td>
<td>bla3b</td>
</tr>''']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.