简体   繁体   English

刮除硒中的特定表

[英]Scraping a specific table in selenium

I am trying to scrape a table found inside a div on a page. 我正在尝试刮擦在页面的div内找到的表。

Basically here's my attempt so far: 到目前为止,基本上是我的尝试:

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys

driver = webdriver.Chrome()
driver.implicitly_wait(10)

URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location

driver.get(URL)

table = driver.find_element_by_xpath('//div[@class="line-chart"]/table/tbody')

print table.text

If I run the script, with an argument like "stackoverflow" I should be able to scrape this site: https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow 如果运行脚本,并使用“ stackoverflow”之类的参数,我应该可以抓取此网站: https : //www.google.us/trends/explore? date = today%203-m & geo = US & q = stackoverflow

Apparently the xpath I have there is not working, the program is not printing anything, it's just plain blank. 显然我那里的xpath无法正常工作,程序没有打印任何内容,只是空白。

I am basically in need on the values of the chart that appears on that website. 我基本上需要该网站上显示的图表值。 And those values (and dates) are inside a table, here is a screenshot: 这些值(和日期)在表中,这是屏幕截图:

在此处输入图片说明

Could you help me locate the correct xpath of the table to retrieve those values using selenium on python? 您能帮我找到表格的正确xpath以便在python上使用硒来检索那些值吗?

Thanks in advance! 提前致谢!

you can use Xpath As Follow: 您可以按如下方式使用Xpath:

//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr

Here I will Refine my answer and make some changes in your code not it's work. 在这里,我将完善我的答案,并在您的代码中进行一些更改而不起作用。

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
from lxml.html import fromstring,tostring

driver = webdriver.Chrome()
driver.implicitly_wait(20)
'''
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
'''
driver.get("https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow")

table_trs = driver.find_elements_by_xpath('//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr')

for tr in table_trs:
    #print tr.get_attribute("innerHTML").encode("UTF-8")

    td = tr.find_elements_by_xpath(".//td")
    if len(td)==2:
        print td[0].get_attribute("innerHTML").encode("UTF-8") +"\t"+td[1].get_attribute("innerHTML").encode("UTF-8")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM