简体   繁体   中英

Scraping a specific table in selenium

I am trying to scrape a table found inside a div on a page.

Basically here's my attempt so far:

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys

driver = webdriver.Chrome()
driver.implicitly_wait(10)

URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location

driver.get(URL)

table = driver.find_element_by_xpath('//div[@class="line-chart"]/table/tbody')

print table.text

If I run the script, with an argument like "stackoverflow" I should be able to scrape this site: https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow

Apparently the xpath I have there is not working, the program is not printing anything, it's just plain blank.

I am basically in need on the values of the chart that appears on that website. And those values (and dates) are inside a table, here is a screenshot:

在此处输入图片说明

Could you help me locate the correct xpath of the table to retrieve those values using selenium on python?

Thanks in advance!

you can use Xpath As Follow:

//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr

Here I will Refine my answer and make some changes in your code not it's work.

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
from lxml.html import fromstring,tostring

driver = webdriver.Chrome()
driver.implicitly_wait(20)
'''
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
'''
driver.get("https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow")

table_trs = driver.find_elements_by_xpath('//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr')

for tr in table_trs:
    #print tr.get_attribute("innerHTML").encode("UTF-8")

    td = tr.find_elements_by_xpath(".//td")
    if len(td)==2:
        print td[0].get_attribute("innerHTML").encode("UTF-8") +"\t"+td[1].get_attribute("innerHTML").encode("UTF-8")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM