简体   繁体   中英

How to replace custom HTML with actual HTML tags using JSoup?

Am using the following version of JSoup (along with Java 1.7):

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.11.3</version>
</dependency>

My code:

public class HtmlTagUtils {

    private static String mockHtml = "<asset-entity type=\"photo\" id=\"1236ad76-7433-fs34-50d1-b12bdbc308899af\">"
+ "</asset-entity>\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n <asset-entity type=\"photo\" id=\"2346fe7d-c175-c380-4ab2-dda068b42b033dvf\">"
+ "</asset-entity>\r\n- The majority of their kids were with them.\r\n<asset-entity type=\"video\" id=\"45064086-5d85-1866-4afc-a523c04c2b3e43b6\"> </asset-entity>\r\n";

    public static List<String> extractIdsForPhotos(String html) {
        Document doc = Jsoup.parse(html);
        Elements elements = doc.select("asset-entity[type=photo]");
        List<String> photos = new ArrayList<>();
        for (Element element : elements) {
            String type = element.attributes().get("type");
            String id = element.attributes().get("id");
            photos.add(id);
        }
        return photos;
    } 

    public static List<String> extractIdsForVideos(String html) {
        Document doc = Jsoup.parse(html);
        Elements elements = doc.select("asset-entity[type=video]");
        List<String> videos = new ArrayList<>();
        for (Element element : elements) {
            String type = element.attributes().get("type");
            String id = element.attributes().get("id");
            videos.add(id);
        }
        return videos;
    }

    public static void main (String args []) {
        List<String> photoIds = extractIdsForPhotos(mockHtml);
        for (String photoId : photoIds) {
            System.out.println("\n\tphotoId: " + photoId);
        }

        List<String> videoIds = extractIdsForVideos(mockHtml);
        for (String videoId : videoIds) {
            System.out.println("\n\tvideoId: " + videoId);
        }
    }       
}

Receive the following output to stdout:

photoId: 1236ad76-7433-fs34-50d1-b12bdbc308899af
photoId: 2346fe7d-c175-c380-4ab2-dda068b42b033dvf
videoId: 45064086-5d85-1866-4afc-a523c04c2b3e43b6

Am able to find the necessary assets based on these ids but my question is how to replace the entire tag (along with its contents, inline) using JSoup (eg for photos):

<asset-entity type=\"photo\" id=\"4806ad76-7433-fs34-50d1-b12bdbc308899ad\">" + "</asset-entity>

with:

<img src="AngelinaJolie.jpg"> 

So the converted HTML would look like this:

"<img src="AngelinaJolie.jpg">\r\nAngelie Jolie was seen at Wholefoods with ex-beau Brad Pitt.\r\n <img src="BradPitt.jpg">
\r\n- The majority of their kids were with them.\r\n<video><source src="Brangelina.mp4" type="video/mp4"></video>\r\n";

Can anyone point me in the right direction?

You can actually change the tagName of the element and try replacing its attributes with your attributes.

        Document doc = Jsoup.parse(html);
        doc.outputSettings().prettyPrint(false);
        Elements elements = doc.select("asset-entity[type=photo]");
        for (Element element : elements) {
            element.tagName("img");
            element.removeAttr("type");
            element.removeAttr("id");
            element.attr("src","AngelinaJolie.jpg");
        }
        String formattedHtml = doc.html();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM