简体   繁体   English

使用Selenium和PhantomJS抓取表格

[英]Scrape Table using selenium and PhantomJS

I am trying to scrape the following table : 我正在尝试刮下表:

在此处输入图片说明

My code is working when I am using the chrome web driver but when using PhantomJS driver the output doesn't seem to get the numbers, it only gets the text. 当我使用chrome网络驱动程序时,我的代码正常工作,但是当使用PhantomJS驱动程序时,输出似乎没有得到数字,而仅得到了文本。

在此处输入图片说明

My Python code is this : 我的Python代码是这样的:

    from selenium import webdriver

path_to_chromedriver = '/Users/amr_f/Desktop/chromedriver' # change path as needed
browser = webdriver.PhantomJS('/home/ubuntu/phantomjs-2.1.1-linux-x86_64/bin/phantomjs')
url = 'http://www.cibeg.com/English/Pages/default.aspx'
browser.get(url)

browser.find_element_by_xpath('//*[@id="sliderHome"]/div[2]/div/ul/li[3]/a').click()

data = []

for tr in browser.find_elements_by_xpath('//*[@id="divCurrTableContainer"]/table'):
    tds = tr.find_elements_by_tag_name('td')
    if tds: 

        data.append([td.text for td in tds])
print(data)

By adding, browser.set_window_size(1124, 850) , to set the window size for the PhantomJS driver I was able to retrieve the table's data from the page. 通过添加browser.set_window_size(1124, 850)来设置PhantomJS驱动程序的窗口大小,我能够从页面中检索表的数据。

This happens, if I'm recalling this correctly, because certain javascript libraries use the window's size "on load". 如果我没有正确记起它,则会发生这种情况,因为某些javascript库使用“加载时”窗口的大小。 Not having the window size parameter can cause the routine to not correctly load all the elements on the page. 没有window size参数可能导致例程无法正确加载页面上的所有元素。

from selenium import webdriver


browser = webdriver.PhantomJS('/home/ubuntu/phantomjs-2.1.1-linux-x86_64/bin/phantomjs')
browser.set_window_size(1124, 850)
url = 'http://www.cibeg.com/English/Pages/default.aspx'
browser.get(url)    
browser.find_element_by_xpath('//*[@id="sliderHome"]/div[2]/div/ul/li[3]/a').click()    
data = []

for tr in browser.find_elements_by_xpath('//*[@id="divCurrTableContainer"]/table'):
    tds = tr.find_elements_by_tag_name('td')
    if tds:     
        data.append([td.text for td in tds])

print(data)

After I added the window size I was able to retrieve: 添加窗口大小后,我可以检索:

[['USD', '16.26', '16.75', 'EUR', '17.6696', '18.3563', 'GBP', '20.0895', '20.8621', 'CHF', '16.4571', '17.0536', 'SAR', '4.3297', '4.4663', 'KWD', '53.5202', '55.3353']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM