簡體   English   中英

當我嘗試用 selenium 和 python 抓取這個網站時

[英]When i try to scrape this website with selenium and python

當我嘗試抓取網站時,它只會拋出一些錯誤,我認為這可能與我的 webdriver 有關,但 idk

我正在嘗試獲取這些數據,以便我可以將其放入電子表格中以獲得一些很酷的靜態數據

from selenium import webdriver

url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'

driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
driver.get(url)

names = driver.find_elements_by_class_name("  column-player")

for name in names:
    title = name.find_element_by_xpath('.//a').text
    print(name)

這是我在終端中得到的錯誤

d:\downloads\PythonScraping\Test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
    driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")

DevTools listening on ws://127.0.0.1:53131/devtools/browser/73ca0453-352e-47a0-a98a-fb539150d6f9
d:\downloads\PythonScraping\Test.py:8: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
    names = driver.find_elements_by_class_name("  column-player")
Traceback (most recent call last):
  File "d:\downloads\PythonScraping\Test.py", line 8, in <module>
      names = driver.find_elements_by_class_name("  column-player")
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 783, in 
find_elements_by_class_name
      return self.find_elements(by=By.CLASS_NAME, value=name)
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 1279, in find_elements
      return self.execute(Command.FIND_ELEMENTS, {
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in 
execute
      self.error_handler.check_response(response)
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, 
in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=98.0.4758.102)
Stacktrace:
Backtrace:
        Ordinal0 [0x00EF69A3+2582947]
        Ordinal0 [0x00E8A6D1+2139857]
        Ordinal0 [0x00D83A98+1063576]
        Ordinal0 [0x00D862B7+1073847]
        Ordinal0 [0x00D8617E+1073534]
        Ordinal0 [0x00D863F0+1074160]
        Ordinal0 [0x00DAFCB2+1244338]
        Ordinal0 [0x00DB013B+1245499]
        Ordinal0 [0x00DD9F8C+1417100]
        Ordinal0 [0x00DC8594+1344916]
        Ordinal0 [0x00DD834A+1409866]
        Ordinal0 [0x00DC8366+1344358]
        Ordinal0 [0x00DA5176+1200502]
        Ordinal0 [0x00DA6066+1204326]
        GetHandleVerifier [0x0109BE02+1675858]
        GetHandleVerifier [0x0115036C+2414524]
        GetHandleVerifier [0x00F8BB01+560977]
        GetHandleVerifier [0x00F8A8D3+556323]
        Ordinal0 [0x00E9020E+2163214]
        Ordinal0 [0x00E95078+2183288]
        Ordinal0 [0x00E951C0+2183616]
        Ordinal0 [0x00E9EE1C+2223644]
        BaseThreadInitThunk [0x7586FA29+25]
        RtlGetAppContainerNamedObjectPath [0x77957A9E+286]
        RtlGetAppContainerNamedObjectPath [0x77957A6E+238]

這里有兩個問題:

  1. 代替
names = driver.find_elements_by_class_name("  column-player")

它應該是

names = driver.find_elements_by_class_name("column-player")

(我知道,在column-player class 名稱之前有空格,但你仍然不應該將它們放在定位器中)
2)您應該添加延遲以僅在頁面完全加載后訪問這些元素。
這應該更好用:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'

driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
wait = WebDriverWait(driver, 20)

driver.get(url)

#wait for at least 1 element visibility
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".column-player")))
#short additional wait for all the other elements to complete loading
time.sleep(0.5)
names = driver.find_elements_by_class_name("column-player")

for name in names:
    title = name.find_element_by_xpath('.//a').text
    print(name)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM