简体   繁体   中英

Parse HTMl using JSOUP - Need specific pattern

I am trying to get text between tags and save into some variable, for example: Here I want to save value return which is between em tags. Also I need the rest of the text which is in p tags, em tag value is assigned with return and p tag value should return only --> an item, cancel an order, print a receipt, track your purchases or reorder items. if some value is before em tag, even that value should be in different variable basically one p if it has multiple tags within then it should be split and save into different variables. If I know how can I get rest of text which are not in inner tags I can retrieve the rest.

I have written below: the below is returning just "return" which is in "'em' tags. Here ep is basically doc.select(p) , selecting p tag and then iterating, not sure if I am doing right way, any other approaches are highly appreciated.

String text ="\<p><em>return </em>an item, cancel an order, print a receipt, track your purchases or reorder items.</p>"

Elements italic_tags = ep.select("em");
for(Element em:italic_tags) { 
 if(em.tagName().equals("em")) {
    System.out.println( em.select("em").text());
   }
}

If you need to select each sub text and text enclosed by different tags you need to try selecting Node instead of Element . I modified your HTML to include more tags so the example is more complete:

        String text = "<p><em>return </em>an item, <em>cancel</em> an order, <em>print</em> a receipt, <em>track</em> your purchases or reorder items.</p>";
        Document doc = Jsoup.parse(text);

        Element ep = doc.selectFirst("p");
        List<Node> childNodes = ep.childNodes();
        for (Node node : childNodes) {
            if (node instanceof TextNode) {
                // if it's a text, just display it
                System.out.println(node);
            } else {
                // if it's another element, then display its first
                // child which in this case is a text
                System.out.println(node.childNode(0));
            }
        }

output:

return 
an item, 
cancel
 an order, 
print
 a receipt, 
track
 your purchases or reorder items.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM