[英]Scraping a specific table in selenium
我正在尝试刮擦在页面的div内找到的表。
到目前为止,基本上是我的尝试:
# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
driver = webdriver.Chrome()
driver.implicitly_wait(10)
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
driver.get(URL)
table = driver.find_element_by_xpath('//div[@class="line-chart"]/table/tbody')
print table.text
如果运行脚本,并使用“ stackoverflow”之类的参数,我应该可以抓取此网站: https : //www.google.us/trends/explore? date = today%203-m & geo = US & q = stackoverflow
显然我那里的xpath无法正常工作,程序没有打印任何内容,只是空白。
我基本上需要该网站上显示的图表值。 这些值(和日期)在表中,这是屏幕截图:
您能帮我找到表格的正确xpath以便在python上使用硒来检索那些值吗?
提前致谢!
您可以按如下方式使用Xpath:
//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr
在这里,我将完善我的答案,并在您的代码中进行一些更改而不起作用。
# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
from lxml.html import fromstring,tostring
driver = webdriver.Chrome()
driver.implicitly_wait(20)
'''
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
'''
driver.get("https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow")
table_trs = driver.find_elements_by_xpath('//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr')
for tr in table_trs:
#print tr.get_attribute("innerHTML").encode("UTF-8")
td = tr.find_elements_by_xpath(".//td")
if len(td)==2:
print td[0].get_attribute("innerHTML").encode("UTF-8") +"\t"+td[1].get_attribute("innerHTML").encode("UTF-8")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.