简体   繁体   English

在Java中使用XPath和Selenium解析HTML表数据

[英]Parsing HTML table data with xpath and selenium in java

I want to take the data and organize it without the tags. 我想获取数据并在没有标签的情况下进行整理。 It looks something like this 看起来像这样

<table class="SpecTable">
    <col width="40%" />
    <col width="60%" />
    <tr>
        <td class="LightRowHead">Optical Zoom:</td>
        <td class="LightRow">15x</td>
    </tr>
    <tr>
        <td class="DarkRowHead">Digital Zoom:</td>
        <td class="DarkRow">6x</td>
    </tr>
    <tr>
        <td class="LightRowHead">Battery Type:</td>
        <td class="LightRow">Alkaline</td>
    </tr>
    <tr>
        <td class="DarkRowHead">Resolution Megapixels:</td>
        <td class="DarkRow">14 MP</td>
    </tr>
</table>

and I want to be able to extract all the strings of information so that I can store in a plaintext file with just this: 并且我希望能够提取所有信息字符串,以便可以使用以下方式将其存储在纯文本文件中:

Optical Zoom: 15x Digital Zoom: 6x Battery Type: Alkaline Resolution Megapixels: 14 MP 光学变焦:15倍数码变焦:6倍电池类型:碱性分辨率百万像素:14 MP

public static void main(String[] args) {

        FirefoxProfile profile = new FirefoxProfile();
        profile.setPreference("general.useragent.override", "some UA string");
        WebDriver driver = new FirefoxDriver(profile);

        String Url = "http://www.walmart.com/ip/Generic-14-MP-X400-BK/19863348";
        driver.get(Url);
        List<WebElement> resultsDiv = driver.findElements(By.xpath("//table[contains (@class,'SpecTable')//td"));

        System.out.println(resultsDiv.size());
        for (int i=0; i<resultsDiv.size(); i++) {
            System.out.println(i+1 + ". " + resultsDiv.get(i).getText());
        }

I am programming in Java with Selenium and I cannot figure out the correct XPath expression for it. 我正在使用Selenium在Java中进行编程,因此无法为其找到正确的XPath表达式。

Can someone figure out why I err on this and maybe give me some pointers on how I can parse this data correctly? 有人可以弄清楚我为什么会犯错,并可能给我一些有关如何正确解析此数据的指示吗? Im very new to Selenium and XPaths but I need this for work. 我对Selenium和XPath很陌生,但我需要这项工作。

Also if anyone has any good sources for me to learn Selenium and XPath fast, those would also be greatly appreciated! 另外,如果有人有什么好的资源让我快速学习Selenium和XPath,那么也将不胜感激!

The spec is surprisingly a very good read on XPath. 令人惊讶的是, 该规范在XPath上非常不错。

You might also try CSS selectors . 您也可以尝试CSS选择器

Anyway, one way to get the data from a table can be as following: 无论如何,从表中获取数据的一种方法如下:

// gets all rows
List<WebElement> rows = driver.findElements(By.xpath("//table[@class='SpecTable']//tr"));
// for every line, store both columns
for (WebElement row : rows) {
    WebElement key = row.findElement(By.XPath("./td[1]"));
    doAnythingWithText(key.getText());
    WebElement val = row.findElement(By.XPath("./td[2]"));
    doAnythingWithText(val.getText());
}

Probably this will suite your needs: 可能这将满足您的需求:

string text = driver.findElement(By.cssSelector("table.SpecTable")).getText();

String text will contain all text nodes from the table with class SpecTable. 字符串text将包含该表中所有具有SpecTable类的文本节点。 I prefer using css , because it's supported by IE and faster than xpath. 我更喜欢使用css ,因为它受IE支持并且比xpath更快。 But as for xpath tutorials try this and this . 但是对于xpath教程,请尝试thisthis

As another option you could grab all the cells of the table into one array and access them that way. 作为另一种选择,您可以将表的所有单元格都放入一个数组并以这种方式访问​​它们。 EG. 例如。

ReadOnlyCollection<IWebElement> Cells = driver.FindElements(By.XPath("//table[@class='SpecTable']//tr//td"));

This will get you all the cells in that table as an array which you can then use to access the text iteratively. 这将使您将该表中的所有单元格作为数组,然后可以用于迭代访问文本。

string forOutput = Cells[i].Text;

CSharp method to extract any table in a 2 dimension array: CSharp方法提取二维数组中的任何表:

private string[,] getYourSpecTable(){
    return getArrayBy(By.CssSelector("table.SpecTable tr"), By.CssSelector("td"));
}

private string[,] getArrayBy(By rowsBy, By columnsBy){
    bool init=false;
    int nbRow=0, nbCol=0;
    string[,] ret = null;
    ReadOnlyCollection<OpenQA.Selenium.IWebElement> rows = this.webDriver.FindElements(rowsBy);
    nbRow = rows.Count;
    for(int r=0;r<nbRow;r++) {
        ReadOnlyCollection<OpenQA.Selenium.IWebElement> cols = rows[r].FindElements(columnsBy);
        if(!init) {
            init= true;
            nbCol = cols.Count;
            ret = new string[rows.Count, cols.Count];
        }                
        for(int c=0;c<nbCol;c++) {
            ret[r, c] = cols[c].Text;
        }
    }
    return ret;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM