简体   繁体   English

java使用jsoup和等效的xpath查找表

[英]java find table using jsoup and equivalent xpath

Here is the HTML code: 这是HTML代码:

<table class="textfont" cellspacing="0" cellpadding="0" width="100%" align="center" border="0">
    <tbody>
        <tr>
            <td class="chl" width="20%">Batch ID</td><td class="ctext">d32654464bdb424396f6a91f2af29ecf</td>
        </tr>
        <tr>
            <td class="chl" width="20%">ALM Server</td>
            <td class="ctext"></td>
        </tr>
        <tr>
            <td class="chl" width="20%">ALM Domain/Project</td>
            <td class="ctext">EBUSINESS/STERLING</td>
        </tr>
        <tr>
            <td class="chl" width="20%">TestSet URL</td>
            <td class="ctext">almtestset://<a href="http://localhost.com">localhost</a></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Tests Executed</td>
            <td class="ctext"><b>6</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Start Time</td>
            <td class="ctext">08/31/2017 12:20:46 PM</td>
        </tr>
        <tr>
            <td class="chl" width="20%">Finish Time</td>
            <td class="ctext">08/31/2017 02:31:46 PM</td>
        </tr>
        <tr>
            <td class="chl" width="20%">Total Duration</td>
            <td class="ctext"><b>2h 11m </b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Test Parameters</td>
            <td class="ctext"><b>{&quot;browser&quot;:&quot;chrome&quot;,&quot;browser-version&quot;:&quot;56&quot;,&quot;language&quot;:&quot;english&quot;,&quot;country&quot;:&quot;US&quot;}</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Passed</td>
            <td class="ctext" style="color:#269900"><b>0</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Failed</td>
            <td class="ctext" style="color:#990000"><b>6</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Not Completed</td>
            <td class="ctext" style="color: ##ff8000;"><b>0</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Test Pass %</td>
            <td class="ctext" style="color:#990000;font-size:14px"><b>0.0%</b></td>
        </tr>
    </tbody>

And here is the xpath to get the table: 这是获取表的xpath:

//td[text() = 'TestSet URL']/ancestor::table[1]

How can I get this table using jSoup? 如何使用jSoup获取此表? I've tried: 我试过了:

tableElements = doc.select("td:contains('TestSet URL')");

to get the child element, but that doesn't work and returns null. 获取child元素,但这不起作用,并返回null。 I need to find the table and put all the children into a map. 我需要找到桌子并将所有孩子放到地图上。 Any help would be greatly appreciated! 任何帮助将不胜感激!

The following code will parse your table into a map, this code is subject to a few assumptions: 以下代码会将您的表解析为一个映射,此代码受一些假设的约束:

  • This xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find any table which contains the text "TestSet URL" anywhere in its body, this seems a little bit brittle but assuming it is sufficient for you the JSoup code in getTable() is functionally equiavalent to that xpath 这个xpath //td[text() = 'TestSet URL']/ancestor::table[1]将在其主体中的任何位置找到任何包含文本“ TestSet URL”的表,这似乎有点脆弱,但假设它是对您而言足够的getTable()的JSoup代码在功能上等效于该xpath
  • The code below assumes that every row contains two cells with the first one being the key and the second one being the value, since you want to parse the table content to a map this assumption seems valid 下面的代码假定每一行都包含两个单元格,其中第一个为键,第二个为值,因为您要将表内容解析为映射,因此此假设似乎有效
  • The code below throws exceptions if the above assumptions are not met ie if the given HTML does not contain a table definition with "TestSet URL" embedded in its body or if there are more than two cells in any row within that table. 如果不满足以上假设,则下面的代码将引发异常,即,给定的HTML不包含在其主体中嵌入“ TestSet URL”的表定义,或者该表的任何行中有两个以上的单元格。

If those assumptions are invalid then the internals of getTable and parseTable will change but the general approach will remain valid. 如果这些假设无效,则getTableparseTable的内部将发生变化,但常规方法将保持有效。

public void parseTable() {
    Document doc = Jsoup.parse(html);

    // declare a holder to contain the 'mapped rows', this is a map based on the assumption that every row represents a discreet key:value pair
    Map<String, String> asMap = new HashMap<>();
    Element table = getTable(doc);

    // now walk though the rows creating a map for each one
    Elements rows = table.select("tr");
    for (int i = 0; i < rows.size(); i++) {
        Element row = rows.get(i);
        Elements cols = row.select("td");

        // expecting this table to consist of key:value pairs where the first cell is the key and the second cell is the value
        if (cols.size() == 2) {
            asMap.put(cols.get(0).text(), cols.get(1).text());
        } else {
            throw new RuntimeException(String.format("Cannot parse the table row: %s to a key:value pair because it contains %s cells!", row.text(), cols.size()));
        }
    }
    System.out.println(asMap);
}

private Element getTable(Document doc) {
    Elements tables = doc.select("table");
    for (int i = 0; i < tables.size(); i++) {
        // this xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find the first table which contains the
        // text "TestSet URL" anywhere in its body
        // this crude evaluation is the JSoup equivalent of that xpath
        if (tables.get(i).text().contains("TestSet URL")) {
            return tables.get(i);
        }
    }
    throw new RuntimeException("Cannot find a table element which contains 'TestSet URL'!");
}

For the HTML posted in your question, the above code will output: 对于您问题中张贴的HTML,以上代码将输出:

{Finish Time=08/31/2017 02:31:46 PM, Passed=0, Test Parameters={"browser":"chrome","browser-version":"56","language":"english","country":"US"}, TestSet URL=almtestset://localhost, Failed=6, Test Pass %=0.0%, Not Completed=0, Start Time=08/31/2017 12:20:46 PM, Total Duration=2h 11m, Tests Executed=6, ALM Domain/Project=EBUSINESS/STERLING, Batch ID=d32654464bdb424396f6a91f2af29ecf, ALM Server=}    

You have to remove those quotation marks to get the row with the text; 您必须删除那些引号才能获得包含文本的行; just 只是

tableElements = doc.select("td:contains(TestSet URL)");

but note with the above you are only selecting td elements which contain the text "TestSet URL". 但请注意,您仅选择包含文本“ TestSet URL”的td元素。 To select the whole table use 选择整个表使用

Element table = doc.select("table.textfont").first();

which means select table with class=textfont and to avoid selecting multiple tables which can have the same class value you have to specify which to choose, therefore: first(). 这意味着选择具有class = textfont的表,并避免选择多个具有相同类值的表,因此必须指定要选择的表,因此:first()。

To get all the tr elements: 要获取所有tr元素:

    Elements tableRows = doc.select("table.textfont tr");
    for(Element e: tableRows)
    System.out.println(e);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM