简体   繁体   中英

Scraping an 'onclick' table with Selenium in Python

I am attempting to scrape the following webpage, using Selenium in Python (with Chrome Web Driver).

https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah 1

I only wish to collect the rows of data in which the bookmaker is Bet365.

I have been able to obtain all the rows where this is the case. However, I am struggling to scrape the information within the 'onclick' table that appears when the values are clicked:


The image above shows the table ARCHIVE ODDS, which appears when the 5.90 is clicked.

The aim is to collect the information from each table in all the rows where Bet365 is the bookmaker.

My attempt so far has been to locate all the 'onclick' links using a CSS-selector:

table_links = browser.find_elements_by_css_selector("span[onclick*='16);']")

And then to loop through each of the table_links, click each one, and scrape the data which appears using the xpath:

bet365table = []
for i in table_links:
    xx = browser.find_element_by_xpath("//TBODY[@id='aodds-tbody']")

However, this fails each time with the error stating the element is not clickable.

You could also mimic the XHR requests and get the JSON responses. Bet365 has id of 16. You can test for qualifying rows with CSS selector

import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

d = webdriver.Chrome()
WebDriverWait(d,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".in-bookmaker-logo-link.in-bookmaker-logo-link--primary.l16")))

base = 'https://www.betexplorer.com/archive-odds/'
links = d.find_elements_by_css_selector("[onclick$=', 16);']")
extracted_links = [link.get_attribute("onclick").strip("load_odds_archive(this, '").strip("', 16);") for link in links]
json_links = [base + link + '/16/?_=1' for link in extracted_links]

for link in json_links:
    res = requests.get(link)
    data= json.loads(res.content)
    data = json_normalize(data)


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM