简体   繁体   English

使用 Selenium 和 Python 抓取网站时无法找到分页链接

[英]Unable to locate the pagination links when scraping a website using Selenium and Python

I'm learning to use Selenium for web scraping.我正在学习使用 Selenium 进行 web 抓取。 I have a couple of questions with the website I'm working with:我对正在使用的网站有几个问题:

-The website has multiple pages to go over and I can't seem to find a way to locate the pages' paths and go over them. - 该网站有多个页面到 go 并且我似乎找不到找到页面路径和 go 的方法。 For example, the following code returns link_page as NoneType .例如,以下代码将link_page返回为NoneType

from selenium import webdriver

import time
driver = webdriver.Chrome('chromedriver')

driver.get('https://www.oddsportal.com/soccer/england/premier-league')
time.sleep(0.5)
results_button = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[2]/ul/li[3]/span')
results_button.click()
time.sleep(3)

season_button = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[3]/ul/li[2]/span/strong/a')
season_button.click()

link_page = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[6]/div/a[3]/span').get_attribute('href')
print(link_page.text)
driver.get(link_page)

-For some reason I have to use the results_button to be able to get the href of matches. - 出于某种原因,我必须使用results_button才能获得匹配项的href For example, the following code tries to go the page directy (as an attempt to circumvent problem 1 above), but the link_page returns a NoSuchElementException error.例如,下面的代码尝试 go 页面直接(试图规避上面的问题 1),但link_page返回NoSuchElementException错误。

from selenium import webdriver
import time

driver = webdriver.Chrome('chromedriver')
driver.get('https://www.oddsportal.com/soccer/england/premier-league/results/#/page/2')
time.sleep(3)

link_page = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[6]/table/tbody/tr[11]/td[2]/a').get_attribute('href')
print(link_page.text)
driver.get(link_page)

To locate the pages to go over them using Selenium you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategies :要使用Selenium将页面定位到 go 上,您需要为visibility_of_all_elements_located()诱导WebDriverWait ,您可以使用以下定位器策略

  • Using XPATH :使用XPATH

     driver.get('https://www.oddsportal.com/soccer/england/premier-league/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[text()='RESULTS']"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[text()='2018/2019']"))).click() print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='active-page']//following::a[@x-page]/span[not(contains(., '|')) and not(contains(., '»'))]/..")))])
  • Console Output:控制台 Output:

     ['https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/2/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/3/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/4/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/5/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/6/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/7/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/8/']
  • Note : You have to add the following imports:注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:在硒中抓取时无法定位元素 - Python : Unable to locate element while scraping in selenium 使用 Python 和 Selenium 抓取网站时无法循环访问链接 - Unable to loop trough the links when crawling a website with Python and Selenium NoSuchElementException:消息:没有这样的元素:使用 Selenium 和 Python 刮擦前 20 个支架时无法定位元素错误 - NoSuchElementException: Message: no such element: Unable to locate element error while scraping the top 20 holder using Selenium and Python 尝试使用 Selenium 提取 Python 中的文本时无法定位元素 - Unable to locate element when attempting to extract a text in Python using Selenium "Selenium 仅在使用无头铬(Python)时无法定位元素" - Selenium Unable to locate element only when using headless chrome (Python) 带有Python和Selenium的抓取问题(无法找到元素) - Scraping issue w/ Python & Selenium (Unable to locate element) 使用 Python Selenium 抓取文本:无法定位真正存在的元素 - Scraping text with Python Selenium: unable to locate element which really exist 我的Selenium Python代码无法在网页中找到链接 - My selenium python code unable to locate links in the webpage Python Selenium 在元素存在时无法定位元素? - Python Selenium unable to locate element when it's there? 使用 Python Selenium 使用画布元素抓取网站 - Scraping website with canvas elements using Python Selenium
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM