简体   繁体   English

使用Jsoup库解析具有键值对的String

[英]Parsing String with key value pair using Jsoup Library

I have stucked with how to parse these data in the form of key value pair.Please guide me 我一直坚持如何以键值对的形式解析这些数据。请指导我

<div class="content">
    <div class="label">Company Name: </div>
    Cartell Chemical Co., Ltd.
    <br/>
    <div class="label">Business Owner: </div>
    Michael Chen
    <br/>
    <div class="label">Employees: </div>
    210
    <br/>
    <div class="label">Main markets: </div>
    North America, Europe, China, South Asia
    <br/>
    <div class="label">Business Type: </div>
    Manufacturer
    <br/>
</div>

I need output in these format.please guide me using Java with Jsoup library 我需要这些格式的输出。请指导我将Java与Jsoup库一起使用

Company Name:Cartell Chemical Co., Ltd.
Business Owner:Michael Chen
Employees:210
Main markets:North America, Europe, China, South Asia
Business Type:Manufacturer

Have a look at the documentation. 看一下文档。

Here's a working example: 这是一个工作示例:

public class StackOverflow20973268 {
    private static String input = "<div class=\"content\">" +
            "<div class=\"label\">Company Name: </div>" +
            "Cartell Chemical Co., Ltd." +
            "<br/>" +
            "<div class=\"label\">Business Owner: </div>" +
            "Michael Chen" +
            "<br/>" +
            "<div class=\"label\">Employees: </div>" +
            "210" +
            "<br/>" +
            "<div class=\"label\">Main markets: </div>" +
            "North America, Europe, China, South Asia" +
            "<br/>" +
            "<div class=\"label\">Business Type: </div>" +
            "Manufacturer" +
            "<br/>" +
            "</div>";

    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.parse(input);
        Elements labels = doc.select("div.content div.label");
        for (Element label : labels) {
            System.out.println(String.format("%s:%s", label.text().trim(),
                    label.nextSibling().outerHtml()));
        }
    }
}

Output: 输出:

Company Name::Cartell Chemical Co., Ltd.
Business Owner::Michael Chen
Employees::210
Main markets::North America, Europe, China, South Asia
Business Type::Manufacturer

The Jsoup library is very good for parsing html. Jsoup库非常适合解析html。 It allows extracting values by class/id name or by tree dom traversal. 它允许按类/ id名称或按树dom遍历来提取值。 You basically get a div element and find its children which could be text nodes (containing the text to be parsed) or another element which will have its own children. 基本上,您将获得一个div元素,并找到其子元素,该子元素可以是文本节点(包含要解析的文本),也可以是另一个具有自己的子元素的元素。 Example you could do something like (not tested with some pseudo) 例如,您可以做类似的事情(未经某些伪测试)

    doc = Jsoup.parse(info);
        Elements divs= doc.body().getElementsByTag("div");
    for (Element divElement: divs) {
        //extract text of div element with div.textNodes()
        //then 
        //div.nextNode() or something like that 
    }

Basically finding elements and stepping either into them for text or to the next/previous one. 基本上是查找元素,然后进入其中以查找文本或进入下一个/上一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM