簡體   English   中英

刮除硒中的特定表

[英]Scraping a specific table in selenium

我正在嘗試刮擦在頁面的div內找到的表。

到目前為止,基本上是我的嘗試:

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys

driver = webdriver.Chrome()
driver.implicitly_wait(10)

URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location

driver.get(URL)

table = driver.find_element_by_xpath('//div[@class="line-chart"]/table/tbody')

print table.text

如果運行腳本,並使用“ stackoverflow”之類的參數,我應該可以抓取此網站: https : //www.google.us/trends/explore? date = today%203-m & geo = US & q = stackoverflow

顯然我那里的xpath無法正常工作,程序沒有打印任何內容,只是空白。

我基本上需要該網站上顯示的圖表值。 這些值(和日期)在表中,這是屏幕截圖:

在此處輸入圖片說明

您能幫我找到表格的正確xpath以便在python上使用硒來檢索那些值嗎?

提前致謝!

您可以按如下方式使用Xpath:

//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr

在這里,我將完善我的答案,並在您的代碼中進行一些更改而不起作用。

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
from lxml.html import fromstring,tostring

driver = webdriver.Chrome()
driver.implicitly_wait(20)
'''
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
'''
driver.get("https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow")

table_trs = driver.find_elements_by_xpath('//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr')

for tr in table_trs:
    #print tr.get_attribute("innerHTML").encode("UTF-8")

    td = tr.find_elements_by_xpath(".//td")
    if len(td)==2:
        print td[0].get_attribute("innerHTML").encode("UTF-8") +"\t"+td[1].get_attribute("innerHTML").encode("UTF-8")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM