简体   繁体   English

如何使用Jsoup替换每个标签中的“文本”

[英]How I can replace “text” in the each tag using Jsoup

I have the following html: 我有以下html:

<html>
<head>
</head>
<body>
    <div id="content" >
         <p>text <strong>text</strong> text <em>text</em> text </p>
    </div>
</body>    
</html>

How I can replace "text" to "word" in the each tag using Jsoup library. 如何使用Jsoup库在每个标签中将 “文本”替换为“单词”。 I want to see: 我想看看:

<html>
<head>
</head>
<body>
    <div id="content" >
         <p>word <strong>word</strong> word <em>word</em> word </p>
    </div>
</body>    
</html>

Thank you for any suggestions! 感谢您的任何建议!

UPD: Thanks for answers, but I found the versatile way: UPD:感谢您的回答,但我发现了一种通用的方式:

    Element entry = doc.select("div").first();
    Elements tags = entry.getAllElements();
    for (Element tag : tags) {
        for (Node child : tag.childNodes()) {
            if (child instanceof TextNode && !((TextNode) child).isBlank()) {
                System.out.println(child); //text
                ((TextNode) child).text("word"); //replace to word
            }
        }
    }
Document doc = Jsoup.connect(url).get();
String str = doc.toString();
str = str.replace("text", "word");

try it.. 试试吧..

A quick search turned up this code: 快速搜索找到了以下代码:

Elements strongs = doc.select("strong");
Element f = strongs.first();
Element l = strongs.last();1,siblings.lastIndexOf(l));

etc 等等

First what you want to do is understand how the library works and what features it contains, and then you figure out how to use the library to do what you need. 首先,您要了解的是库的工作方式以及它包含的功能,然后您要弄清楚如何使用该库来完成所需的工作。 The code above seems to allow you to select a strong element, at which point you could update it's inner text, but I'm sure there are a number of ways you could accomplish the same. 上面的代码似乎允许您选择一个较强的元素,这时您可以更新它的内部文本,但是我敢肯定,可以通过多种方法来实现相同的目的。

In general, most libraries which parse xml are able to select any given element in the document object model, or any list of elements, and either manipulate the elements themselves, or their inner text, attributes and the like. 通常,大多数解析xml的库都可以选择文档对象模型中的任何给定元素或元素的任何列表,并可以操纵元素本身或其内部文本,属性等。

Once you gain more experience working with different libraries, your starting point is to look for the documentation of the library to see what that library does. 一旦获得了使用其他库的更多经验,您的出发点便是查找库的文档,以了解该库的功能。 If you see a method that says it does something, that's what it does, and you can expect to use it to accomplish that goal. 如果您看到一个说它可以完成某件事的方法,那就是它所做的,并且您可以期望使用它来实现该目标。 Then, instead of writing a question on Stack Overflow, you just need to parse the functionality of the library you're using, and figure out how to use it to do what you want. 然后,您无需解析有关Stack Overflow的问题,只需解析正在使用的库的功能,并弄清楚如何使用它来完成所需的工作。

    String html = "<html> ...";
    Document doc = Jsoup.parse(html);
    Elements p = doc.select("div#content > p");
    p.html(p.html().replaceAll("text", "word"));
    System.out.println(doc.toString());

div#content > p means that the elements <p> in the element <div> which id is content . div#content > p表示id为content的元素<div>中的元素<p>

If you want to replace the text only in <strong>text</strong> : 如果您只想替换<strong>text</strong>

    Elements p = doc.select("div#content > p > strong");
    p.html(p.html().replaceAll("text", "word"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM