[英]Selenium Web Scraping With Beautiful Soup on Dynamic Content and Hidden Data Table
[英]web scraping with selenium on a dynamic table
我正在尝试从动态网站上抓取表格(我相信它每 10 秒更新一次信息)并将其加载到熊猫数据框,但我似乎无法通过获取第一列的第一步。 有人可以建议我做错了什么吗? 谢谢。
# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
urlpage = 'https://new.cryptoxscanner.com/binance/live'
driver = webdriver.Chrome(executable_path=r"C:\Users\xxxxx\Desktop\chrome\chromedriver.exe")
driver.get(urlpage)
time.sleep(10)
ticker = driver.find_element_by_xpath('//*[@id="scroll-source-1"]/table/tbody/tr[2]')
首先,您需要等到数据定位,使用.visibility_of_all_elements_located
。 您可以使用此定位器等待:
//table[contains(@class, "table-sm")]//a
找到所有数据后,就可以提取表数据了。 试试下面的代码:
driver.get('https://new.cryptoxscanner.com/binance/live')
#UPDATED HERE
option = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//select[contains(., "All")]'))))
option.select_by_visible_text('All')
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[contains(@class, "table-sm")]//a')))
data = driver.find_element_by_class_name('table-responsive')
print(data.text)
以下导入:
#UPDATED HERE
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.