简体   繁体   English

硒无法刮桌子

[英]selenium Unable to scrape a table

I'm unable to scrape the table from https://solanabeach.io/validators .我无法从https://solanabeach.io/validators抓取表格。 For some reason, I can't access it using the following code snippet.出于某种原因,我无法使用以下代码片段访问它。 Does anyone have an idea why I'm unable to scrape the table?有谁知道为什么我无法刮桌子?

from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException

options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument("--enable-javascript")
options.add_argument('--no-sandbox')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

driver.get(f"https://solanabeach.io/validators")


driver.implicitly_wait(10)
api = BeautifulSoup(driver.find_element_by_xpath("//*").get_attribute("outerHTML"), 'html.parser')


table = api.findAll('tbody')

print(table)

driver.quit()

You dont need to use BeautifulSoup here.你不需要在这里使用 BeautifulSoup。 You can simply use selenium methods.您可以简单地使用硒方法。

from selenium.webdriver.support import expected_conditions as EC
import re
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait

PATH = r"C:\Users\deepak.mathpal\IdeaProjects\Selenium4\src\main\resources\chromedriver_chrome_95\chromedriver.exe"
driver = webdriver.Chrome(PATH)

url = 'https://solanabeach.io/validators'

driver.get(url)
driver.maximize_window()
WebDriverWait(driver, 20).until(
    EC.visibility_of_element_located((By.XPATH, "//table[@class='table table-bordered maintable "
                                                "table-striped-even']//tbody/tr")))

columnHeader = driver.find_element(By.XPATH, "//table[@class='table table-bordered maintable "
                                             "table-striped-even']//thead")
print("---------------------------------------------------------------------------------------")
print(re.sub(r"\s+", '|', columnHeader.text.strip()))
print("---------------------------------------------------------------------------------------")
textInPage = driver.find_elements(By.XPATH, "//table[@class='table table-bordered maintable "
                                            "table-striped-even']//tbody/tr")
del textInPage[:2]
for element in textInPage:
    print(element.text)
    print("---------------------------------------------------------------------------------------")

driver.quit()

Output:输出:

--------------------------------------------------------------------------------
#|VALIDATOR|STAKE|CUMULATIVE|STAKE|COMMISSION|LAST|VOTE
--------------------------------------------------------------------------------
20
VymD
1.8.3
3,896,651(81)
0.98 %
34.5 %
100 %
106,485,715
---------------------------------------------------------------------------------------
21
5KAX...PRuw
1.7.14
3,577,932(59)
0.90 %
35.4 %
100 %
106,485,711

And so on till 1270 items.依此类推,直到 1270 项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM