[英]how to edit all text values in html tags using jsoup
What I want: I am new to Jsoup
. 我想要的是:我是
Jsoup
。 I want to parse my html
string and search for each text value that appears inside tags (any tag). 我想解析我的
html
字符串并搜索出现在标签(任何标签)内的每个文本值。 And then change that text value to something else. 然后将该文本值更改为其他值。
What I have done: I am able to change the text value for single tag. 我已经完成的工作:我可以更改单个标签的文本值。 Below is the code:
下面是代码:
public static void main(String[] args) {
String html = "<div><p>Test Data</p> <p>HELLO World</p></div>";
Document doc1=Jsoup.parse(html);
Elements ps = doc1.getElementsByTag("p");
for (Element p : ps) {
String pText = p.text();
p.text(base64_Dummy(pText));
}
System.out.println("======================");
String changedHTML=doc1.html();
System.out.println(changedHTML);
}
public static String base64_Dummy(String abc){
return "This is changed text";
}
output: 输出:
======================
<html>
<head></head>
<body>
<div>
<p>This is changed text</p>
<p>This is changed text</p>
</div>
</body>
</html>
Above code is able to change the p
tag's value. 上面的代码能够更改
p
标签的值。 But, in my case html
string can contain any tag; 但是,就我而言,
html
字符串可以包含任何标签; whose value I want to search and change. 我想搜索和更改其值。 How can I search all tags in html string and change their text value one by one.
如何搜索html字符串中的所有标签,并一一更改其文本值。
You can try with something similar to this code: 您可以尝试使用类似于以下代码的内容:
String html = "<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";
Document doc = Jsoup.parse(html);
List<Node> children = doc.childNodes();
// We will search nodes in a breadth-first way
Queue<Node> nodes = new ArrayDeque<>();
nodes.addAll(doc.childNodes());
while (!nodes.isEmpty()) {
Node n = nodes.remove();
if (n instanceof TextNode && ((TextNode) n).text().trim().length() > 0) {
// Do whatever you want with n.
// Here we just print its text...
System.out.println(n.parent().nodeName()+" contains text: "+((TextNode) n).text().trim());
} else {
nodes.addAll(n.childNodes());
}
}
And you'll get the following output: 您将获得以下输出:
body contains text: other text
p contains text: Test Data
p contains text: HELLO World
You want to use the CSS selector *
and the method textNodes
to get the text of a given tag ( Element
in Jsoup world). 您想使用CSS选择器
*
和方法textNodes
来获取给定标签的文本(Jsoup世界中的Element
)。
This line below 这条线下面
Elements ps = doc1.getElementsByTag("p");
becomes 变成
Elements ps = doc1.select("*");
Now, with this new selector you'll be able to select any elements (tags) within your HTML code. 现在,使用这个新的选择器,您将可以选择HTML代码中的任何元素(标签)。
public static void main(String[] args) {
System.out.println("Setup proxy...");
JSoup.setupProxy();
String html = "<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";
Document doc1 = Jsoup.parse(html);
Elements tags = doc1.select("*");
for (Element tag : tags) {
for (TextNode tn : tag.textNodes()) {
String tagText = tn.text().trim();
if (tagText.length() > 0) {
tn.text(base64_Dummy(tagText));
}
}
}
System.out.println("======================");
String changedHTML = doc1.html();
System.out.println(changedHTML);
}
public static String base64_Dummy(String abc) {
return "This is changed text";
}
======================
<html>
<head></head>
<body>
<div>
<p>This is changed text</p>
<div>
<p>This is changed text</p>
</div>
</div>This is changed text
</body>
</html>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.