[英]selenium Unable to scrape a table
I'm unable to scrape the table from https://solanabeach.io/validators .我无法从https://solanabeach.io/validators抓取表格。 For some reason, I can't access it using the following code snippet.
出于某种原因,我无法使用以下代码片段访问它。 Does anyone have an idea why I'm unable to scrape the table?
有谁知道为什么我无法刮桌子?
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument("--enable-javascript")
options.add_argument('--no-sandbox')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
driver.get(f"https://solanabeach.io/validators")
driver.implicitly_wait(10)
api = BeautifulSoup(driver.find_element_by_xpath("//*").get_attribute("outerHTML"), 'html.parser')
table = api.findAll('tbody')
print(table)
driver.quit()
You dont need to use BeautifulSoup here.你不需要在这里使用 BeautifulSoup。 You can simply use selenium methods.
您可以简单地使用硒方法。
from selenium.webdriver.support import expected_conditions as EC
import re
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
PATH = r"C:\Users\deepak.mathpal\IdeaProjects\Selenium4\src\main\resources\chromedriver_chrome_95\chromedriver.exe"
driver = webdriver.Chrome(PATH)
url = 'https://solanabeach.io/validators'
driver.get(url)
driver.maximize_window()
WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH, "//table[@class='table table-bordered maintable "
"table-striped-even']//tbody/tr")))
columnHeader = driver.find_element(By.XPATH, "//table[@class='table table-bordered maintable "
"table-striped-even']//thead")
print("---------------------------------------------------------------------------------------")
print(re.sub(r"\s+", '|', columnHeader.text.strip()))
print("---------------------------------------------------------------------------------------")
textInPage = driver.find_elements(By.XPATH, "//table[@class='table table-bordered maintable "
"table-striped-even']//tbody/tr")
del textInPage[:2]
for element in textInPage:
print(element.text)
print("---------------------------------------------------------------------------------------")
driver.quit()
Output:输出:
--------------------------------------------------------------------------------
#|VALIDATOR|STAKE|CUMULATIVE|STAKE|COMMISSION|LAST|VOTE
--------------------------------------------------------------------------------
20
VymD
1.8.3
3,896,651(81)
0.98 %
34.5 %
100 %
106,485,715
---------------------------------------------------------------------------------------
21
5KAX...PRuw
1.7.14
3,577,932(59)
0.90 %
35.4 %
100 %
106,485,711
And so on till 1270 items.依此类推,直到 1270 项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.