简体   繁体   English

使用webdriver从Web表中检索列数据的更好方法

[英]Better way to retrieve columns data from a Web table using webdriver

I'm trying to fetch data from a table into List<List<String>> in java. 我正在尝试从表中获取数据到Java中的List<List<String>>中。 Below code works. 下面的代码有效。 But it is taking 20+ seconds to fetch data. 但是,获取数据需要20秒钟以上的时间。 Would like to know is there any other faster way to fetch data from table? 是否想知道还有其他faster方法可以从表中获取数据吗?

List<WebElement> rows = table.findElements(By.xpath(".//tbody//tr//td//.."));
List<ArrayList<String>> rowsData = new ArrayList<ArrayList<String>>();

for(WebElement row:rows){
    List<WebElement> rowElements = row.findElements(By.xpath(".//td"));

    ArrayList<String> rowData = new ArrayList<String>();

    for(WebElement column:rowElements){
        rowData.add(column.getText().toString());
    }

    rowsData.add(rowData);
}

return rowsData;

I think JSoup is better option for larger html parsing. 我认为对于较大的html解析, JSoup是更好的选择。 It provides pretty similar API to Selenium . 它提供与Selenium非常相似的API。

String html =  driver.findElement(By.tagName("table")).getAttribute("innerHTML");
ArrayList<String> colsArray = new ArrayList<>();
HashMap<Element, ArrayList<String>> dict = new HashMap<>();

Document document = Jsoup.connect(html).get();
Elements table = document.select("table");

Elements rows = table.select("tr");

for (Element row: rows){

    Elements list = row.select("td");
    ArrayList<String> newList = new ArrayList<>();

    for (Element str: list){
        newList.add(str.text());
    }

    dict.put(row ,newList);
}

return dict;

First of all your question is bit surprising for me, how does it work? 首先,您的问题对我来说有点令人惊讶,它是如何工作的? You have . 你有. in xpaths and as per my knowledge selenium does need . 在xpaths中,据我所知硒确实需要. in xpath. 在xpath中。 Anyways answer to your question: 无论如何回答您的问题:

  1. If there is any possibility to use any other element locator than xpath then use that, it will definitely reduce the execution time. 如果有可能使用xpath以外的任何其他元素定位器,则可以使用它,这肯定会减少执行时间。 Since you have used for loop there and each loop will try to locate element using xpath and selenium parses entire html document to locate element so obviously it will increase the execution time. 由于您在那里使用过for循环,每个循环都会尝试使用xpath来定位元素,并且selenium会分析整个html文档来定位元素,因此显然会增加执行时间。

  2. If there is no possibility to use any other locator than xpath then you can disable implicit wait before performing above operation. 如果除xpath之外无法使用其他任何定位器,则可以在执行上述操作之前禁用隐式等待。 Since your code does not perform any action like click which refreshes the loaded page so there wont be any issue related to time. 由于您的代码不会执行任何操作(如单击),因此会刷新加载的页面,因此不会出现与时间有关的任何问题。 Just make sure before performing above operation required table dom is completely loaded. 只需确保在执行上述操作之前,已完全加载table dom。

Don't forget to enable implicit wait after finishing above. 完成以上操作后,不要忘记启用隐式等待。

It will be like this: 它将是这样的:

driver.manage().timeouts().implicitlyWait(0, TimeUnit.SECONDS);
List<WebElement> rows = table.findElements(By.xpath("//tbody//tr//td//.."));
List<ArrayList<String>> rowsData = new ArrayList<ArrayList<String>>();

for(WebElement row:rows){
List<WebElement> rowElements = row.findElements(By.xpath("//td"));

ArrayList<String> rowData = new ArrayList<String>();

for(WebElement column:rowElements){
    rowData.add(column.getText().toString());
}

rowsData.add(rowData);}
return rowsData;

driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);

Look, the problem is caused by slowness of selenium. 看,问题是由硒慢引起的。 If you will use some lib for grabbing html -- the same algorithm will work in 1000 times faster. 如果您将使用一些lib来获取html,则相同的算法将以1000倍的速度运行。

Main idea: 大意:

  1. do all work in selenium except of parsing table. 除了解析表外,其他所有工作都在硒中完成。

  2. When you need to parse table, take InnerHtml of the this table via Selenium 当您需要解析表时,通过Selenium获取该表的InnerHtml

  3. Parse this html via external lib 通过外部库解析此html

In case of c# you can use HTMLAgilityPack. 如果是c#,则可以使用HTMLAgilityPack。 In case of java -- you need to google it. 如果是Java,则需要将其谷歌搜索。 I had more than 1000 times faster result with the same algorithm of parsing by this way. 通过这种方法,使用相同的解析算法,我得到的结果要快1000倍以上。

I have created a blog post and an example github project describing this type of situation -- it might help 我创建了一个博客文章和一个示例github项目来描述这种情况-这可能会有所帮助

http://simpleseleniumnotes.blogspot.com/2015/02/interaction-with-html-tables.html https://github.com/5hawnknight/solid-prototype-table http://simpleseleniumnotes.blogspot.com/2015/02/interaction-with-html-tables.html https://github.com/5hawnknight/solid-prototype-table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM