简体   繁体   English

如何使用JSoup用实际的HTML标签替换自定义HTML?

[英]How to replace custom HTML with actual HTML tags using JSoup?

Am using the following version of JSoup (along with Java 1.7): 我正在使用以下版本的JSoup(以及Java 1.7):

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.11.3</version>
</dependency>

My code: 我的代码:

public class HtmlTagUtils {

    private static String mockHtml = "<asset-entity type=\"photo\" id=\"1236ad76-7433-fs34-50d1-b12bdbc308899af\">"
+ "</asset-entity>\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n <asset-entity type=\"photo\" id=\"2346fe7d-c175-c380-4ab2-dda068b42b033dvf\">"
+ "</asset-entity>\r\n- The majority of their kids were with them.\r\n<asset-entity type=\"video\" id=\"45064086-5d85-1866-4afc-a523c04c2b3e43b6\"> </asset-entity>\r\n";

    public static List<String> extractIdsForPhotos(String html) {
        Document doc = Jsoup.parse(html);
        Elements elements = doc.select("asset-entity[type=photo]");
        List<String> photos = new ArrayList<>();
        for (Element element : elements) {
            String type = element.attributes().get("type");
            String id = element.attributes().get("id");
            photos.add(id);
        }
        return photos;
    } 

    public static List<String> extractIdsForVideos(String html) {
        Document doc = Jsoup.parse(html);
        Elements elements = doc.select("asset-entity[type=video]");
        List<String> videos = new ArrayList<>();
        for (Element element : elements) {
            String type = element.attributes().get("type");
            String id = element.attributes().get("id");
            videos.add(id);
        }
        return videos;
    }

    public static void main (String args []) {
        List<String> photoIds = extractIdsForPhotos(mockHtml);
        for (String photoId : photoIds) {
            System.out.println("\n\tphotoId: " + photoId);
        }

        List<String> videoIds = extractIdsForVideos(mockHtml);
        for (String videoId : videoIds) {
            System.out.println("\n\tvideoId: " + videoId);
        }
    }       
}

Receive the following output to stdout: 接收以下输出到stdout:

photoId: 1236ad76-7433-fs34-50d1-b12bdbc308899af
photoId: 2346fe7d-c175-c380-4ab2-dda068b42b033dvf
videoId: 45064086-5d85-1866-4afc-a523c04c2b3e43b6

Am able to find the necessary assets based on these ids but my question is how to replace the entire tag (along with its contents, inline) using JSoup (eg for photos): 能够基于这些ID找到必要的资产,但是我的问题是如何使用JSoup(例如,用于照片)替换整个标签(以及其内容,内联):

<asset-entity type=\"photo\" id=\"4806ad76-7433-fs34-50d1-b12bdbc308899ad\">" + "</asset-entity>

with: 有:

<img src="AngelinaJolie.jpg"> 

So the converted HTML would look like this: 因此,转换后的HTML将如下所示:

"<img src="AngelinaJolie.jpg">\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n <img src="BradPitt.jpg">
\r\n- The majority of their kids were with them.\r\n<video><source src="Brangelina.mp4" type="video/mp4"></video>\r\n";

Can anyone point me in the right direction? 谁能指出我正确的方向?

You can actually change the tagName of the element and try replacing its attributes with your attributes. 您实际上可以更改元素的tagName,然后尝试用您的属性替换其属性。

        Document doc = Jsoup.parse(html);
        doc.outputSettings().prettyPrint(false);
        Elements elements = doc.select("asset-entity[type=photo]");
        for (Element element : elements) {
            element.tagName("img");
            element.removeAttr("type");
            element.removeAttr("id");
            element.attr("src","AngelinaJolie.jpg");
        }
        String formattedHtml = doc.html();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM