When i try and scrape the website it just throws some errors I think it may have something to do with my webdriver but idk
I am trying to get this data so i can put it in a spreadsheet to get some cool staticstics
from selenium import webdriver
url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'
driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
driver.get(url)
names = driver.find_elements_by_class_name(" column-player")
for name in names:
title = name.find_element_by_xpath('.//a').text
print(name)
Here are the error i get in terminal
d:\downloads\PythonScraping\Test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
DevTools listening on ws://127.0.0.1:53131/devtools/browser/73ca0453-352e-47a0-a98a-fb539150d6f9
d:\downloads\PythonScraping\Test.py:8: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
names = driver.find_elements_by_class_name(" column-player")
Traceback (most recent call last):
File "d:\downloads\PythonScraping\Test.py", line 8, in <module>
names = driver.find_elements_by_class_name(" column-player")
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 783, in
find_elements_by_class_name
return self.find_elements(by=By.CLASS_NAME, value=name)
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 1279, in find_elements
return self.execute(Command.FIND_ELEMENTS, {
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in
execute
self.error_handler.check_response(response)
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\errorhandler.py", line 247,
in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=98.0.4758.102)
Stacktrace:
Backtrace:
Ordinal0 [0x00EF69A3+2582947]
Ordinal0 [0x00E8A6D1+2139857]
Ordinal0 [0x00D83A98+1063576]
Ordinal0 [0x00D862B7+1073847]
Ordinal0 [0x00D8617E+1073534]
Ordinal0 [0x00D863F0+1074160]
Ordinal0 [0x00DAFCB2+1244338]
Ordinal0 [0x00DB013B+1245499]
Ordinal0 [0x00DD9F8C+1417100]
Ordinal0 [0x00DC8594+1344916]
Ordinal0 [0x00DD834A+1409866]
Ordinal0 [0x00DC8366+1344358]
Ordinal0 [0x00DA5176+1200502]
Ordinal0 [0x00DA6066+1204326]
GetHandleVerifier [0x0109BE02+1675858]
GetHandleVerifier [0x0115036C+2414524]
GetHandleVerifier [0x00F8BB01+560977]
GetHandleVerifier [0x00F8A8D3+556323]
Ordinal0 [0x00E9020E+2163214]
Ordinal0 [0x00E95078+2183288]
Ordinal0 [0x00E951C0+2183616]
Ordinal0 [0x00E9EE1C+2223644]
BaseThreadInitThunk [0x7586FA29+25]
RtlGetAppContainerNamedObjectPath [0x77957A9E+286]
RtlGetAppContainerNamedObjectPath [0x77957A6E+238]
There are 2 problems here:
names = driver.find_elements_by_class_name(" column-player")
it should be
names = driver.find_elements_by_class_name("column-player")
(I know, there are spaces before column-player
class name there, but you still should not put them inside the locator)
2) You should add a delay to access these elements only after the page have been completely loaded.
This should work better:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'
driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
wait = WebDriverWait(driver, 20)
driver.get(url)
#wait for at least 1 element visibility
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".column-player")))
#short additional wait for all the other elements to complete loading
time.sleep(0.5)
names = driver.find_elements_by_class_name("column-player")
for name in names:
title = name.find_element_by_xpath('.//a').text
print(name)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.