简体   繁体   English

使用Selenium Webdriver使用javascript呈现的表格来抓取网页

[英]Scraping webpage with a table rendered using javascript utilizing Selenium Webdriver

the source html of the page i am trying to scrape 我要抓取的页面的源html

Iam trying to scrape a webtable that is rendered using certain javascripts using Selenium Webdriver Iam尝试使用Selenium Webdriver刮取使用某些JavaScript呈现的Web表

driver.get("http://xxxxx:xxxxxxxx@xxxxxx-
xxxxxx.grid.xxxxxx.com/Windchill/app/#ptc1/comp/queue.table");
driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
List<WebElement> k=driver.findElements(By.xpath("//*[@id='queue.table']"));
System.out.println(k.size());
System.out.println(k.get(0).getText());

k.size() returns 1 and when i run get text it returns only some entries from the table k.size()返回1,当我运行get text时,它仅返回表中的某些条目

Actual table and entries the total rows are 135 实际表和条目的总行数为135

after running i get as follows 运行后,我得到如下

              Queue Management
 Loading...

 Name
 Type
 Status
 Enabled
 Group
 Total Entries
 Waiting Entries
 Severe/Failed Entries
 DeleteCompletedWorkItemsQueu e
 Process
 Started
 Enabled
 Default
 0
 0
 0
 DeliveryStatusOnStartup
 Process
 Started
 Enabled
 Default
 0
 0
 0
 DTODeliverablesQueue
 Process
 Started
 Enabled
 Default
 0
 0
 0
 DTOOffPeakQueue
 Process
 Started
 Enabled
 Default
 0
 0
 0
Loading.........

I get 25 entries of the table and rest is not present I am unable to understand why am i getting "Loading....." 我得到该表的25个条目,其余的不存在,我无法理解为什么我会得到“正在加载.......”

I think by using List<WebElement> k=driver.findElements(By.xpath("//*[@id='queue.table']")); 我认为通过使用List<WebElement> k=driver.findElements(By.xpath("//*[@id='queue.table']")); we are trying to make out a list with too many unwanted items in the list. 我们正在尝试列出一个列表,其中有太多不需要的项目。 Rather, I feel it would be effective to get hold of the nodes within <td> tags which contains the indented values and save into the list. 相反,我认为在<td>标记中保留包含缩进值的节点并将其保存到列表中将是有效的。 Next we can iterate over the list and use either getText() method or getAttribute() method to retrieve the text as follows: 接下来,我们可以遍历列表,并使用getText()方法或getAttribute()方法来检索文本,如下所示:

driver.get("http://xxxxx:xxxxxxxx@xxxxxx-xxxxxx.grid.xxxxxx.com/Windchill/app/#ptc1/comp/queue.table");
driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
List<WebElement> k = driver.findElements(By.xpath("//*[@id='queue.table']//tr"));
System.out.println(k.size());
for (WebElement my_element:k)
    {
        String innerhtml = my_element.getAttribute("innerHTML");
        System.out.println("Value from Table is : "+innerhtml); 
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM