简体   繁体   English

如何使用jsoup编辑html标记中的所有文本值

[英]how to edit all text values in html tags using jsoup

What I want: I am new to Jsoup . 我想要的是:我是Jsoup I want to parse my html string and search for each text value that appears inside tags (any tag). 我想解析我的html字符串并搜索出现在标签(任何标签)内的每个文本值。 And then change that text value to something else. 然后将该文本值更改为其他值。

What I have done: I am able to change the text value for single tag. 我已经完成的工作:我可以更改单个标签的文本值。 Below is the code: 下面是代码:

public static void main(String[] args) {
        String html = "<div><p>Test Data</p> <p>HELLO World</p></div>";
        Document doc1=Jsoup.parse(html);
        Elements ps = doc1.getElementsByTag("p");
        for (Element p : ps) {
          String pText = p.text();
          p.text(base64_Dummy(pText));
        }
        System.out.println("======================");
        String changedHTML=doc1.html();
        System.out.println(changedHTML);
    }

    public static String base64_Dummy(String abc){
        return "This is changed text";
    }

output: 输出:

======================
<html>
 <head></head>
 <body>
  <div>
   <p>This is changed text</p> 
   <p>This is changed text</p>
  </div>
 </body>
</html>

Above code is able to change the p tag's value. 上面的代码能够更改p标签的值。 But, in my case html string can contain any tag; 但是,就我而言, html字符串可以包含任何标签; whose value I want to search and change. 我想搜索和更改其值。 How can I search all tags in html string and change their text value one by one. 如何搜索html字符串中的所有标签,并一一更改其文本值。

You can try with something similar to this code: 您可以尝试使用类似于以下代码的内容:

String html = "<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";

Document doc = Jsoup.parse(html);
List<Node> children = doc.childNodes();

// We will search nodes in a breadth-first way
Queue<Node> nodes = new ArrayDeque<>();

nodes.addAll(doc.childNodes());

while (!nodes.isEmpty()) {
    Node n = nodes.remove();

    if (n instanceof TextNode && ((TextNode) n).text().trim().length() > 0) {
        // Do whatever you want with n.
        // Here we just print its text...
        System.out.println(n.parent().nodeName()+" contains text: "+((TextNode) n).text().trim());
    } else {
        nodes.addAll(n.childNodes());
    }
}

And you'll get the following output: 您将获得以下输出:

body contains text: other text
p contains text: Test Data
p contains text: HELLO World

You want to use the CSS selector * and the method textNodes to get the text of a given tag ( Element in Jsoup world). 您想使用CSS选择器*和方法textNodes来获取给定标签的文本(Jsoup世界中的Element )。

This line below 这条线下面

Elements ps = doc1.getElementsByTag("p");

becomes 变成

Elements ps = doc1.select("*");

Now, with this new selector you'll be able to select any elements (tags) within your HTML code. 现在,使用这个新的选择器,您将可以选择HTML代码中的任何元素(标签)。

FULL CODE EXAMPLE 全代码示例

public static void main(String[] args) {
    System.out.println("Setup proxy...");
    JSoup.setupProxy();

    String html = "<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";
    Document doc1 = Jsoup.parse(html);
    Elements tags = doc1.select("*");
    for (Element tag : tags) {
        for (TextNode tn : tag.textNodes()) {
            String tagText = tn.text().trim();

            if (tagText.length() > 0) {
                tn.text(base64_Dummy(tagText));
            }
        }
    }
    System.out.println("======================");
    String changedHTML = doc1.html();
    System.out.println(changedHTML);
}

public static String base64_Dummy(String abc) {
    return "This is changed text";
}

OUTPUT 输出值

======================
<html>
 <head></head>
 <body>
  <div>
   <p>This is changed text</p> 
   <div> 
    <p>This is changed text</p>
   </div>
  </div>This is changed text
 </body>
</html>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM