简体   繁体   中英

When i try to scrape this website with selenium and python

When i try and scrape the website it just throws some errors I think it may have something to do with my webdriver but idk

I am trying to get this data so i can put it in a spreadsheet to get some cool staticstics

from selenium import webdriver

url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'

driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
driver.get(url)

names = driver.find_elements_by_class_name("  column-player")

for name in names:
    title = name.find_element_by_xpath('.//a').text
    print(name)

Here are the error i get in terminal

d:\downloads\PythonScraping\Test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
    driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")

DevTools listening on ws://127.0.0.1:53131/devtools/browser/73ca0453-352e-47a0-a98a-fb539150d6f9
d:\downloads\PythonScraping\Test.py:8: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
    names = driver.find_elements_by_class_name("  column-player")
Traceback (most recent call last):
  File "d:\downloads\PythonScraping\Test.py", line 8, in <module>
      names = driver.find_elements_by_class_name("  column-player")
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 783, in 
find_elements_by_class_name
      return self.find_elements(by=By.CLASS_NAME, value=name)
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 1279, in find_elements
      return self.execute(Command.FIND_ELEMENTS, {
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in 
execute
      self.error_handler.check_response(response)
  File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, 
in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=98.0.4758.102)
Stacktrace:
Backtrace:
        Ordinal0 [0x00EF69A3+2582947]
        Ordinal0 [0x00E8A6D1+2139857]
        Ordinal0 [0x00D83A98+1063576]
        Ordinal0 [0x00D862B7+1073847]
        Ordinal0 [0x00D8617E+1073534]
        Ordinal0 [0x00D863F0+1074160]
        Ordinal0 [0x00DAFCB2+1244338]
        Ordinal0 [0x00DB013B+1245499]
        Ordinal0 [0x00DD9F8C+1417100]
        Ordinal0 [0x00DC8594+1344916]
        Ordinal0 [0x00DD834A+1409866]
        Ordinal0 [0x00DC8366+1344358]
        Ordinal0 [0x00DA5176+1200502]
        Ordinal0 [0x00DA6066+1204326]
        GetHandleVerifier [0x0109BE02+1675858]
        GetHandleVerifier [0x0115036C+2414524]
        GetHandleVerifier [0x00F8BB01+560977]
        GetHandleVerifier [0x00F8A8D3+556323]
        Ordinal0 [0x00E9020E+2163214]
        Ordinal0 [0x00E95078+2183288]
        Ordinal0 [0x00E951C0+2183616]
        Ordinal0 [0x00E9EE1C+2223644]
        BaseThreadInitThunk [0x7586FA29+25]
        RtlGetAppContainerNamedObjectPath [0x77957A9E+286]
        RtlGetAppContainerNamedObjectPath [0x77957A6E+238]

There are 2 problems here:

  1. Instead of
names = driver.find_elements_by_class_name("  column-player")

it should be

names = driver.find_elements_by_class_name("column-player")

(I know, there are spaces before column-player class name there, but you still should not put them inside the locator)
2) You should add a delay to access these elements only after the page have been completely loaded.
This should work better:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'

driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
wait = WebDriverWait(driver, 20)

driver.get(url)

#wait for at least 1 element visibility
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".column-player")))
#short additional wait for all the other elements to complete loading
time.sleep(0.5)
names = driver.find_elements_by_class_name("column-player")

for name in names:
    title = name.find_element_by_xpath('.//a').text
    print(name)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM