繁体   English   中英

在动态表上使用硒进行网络抓取

[英]web scraping with selenium on a dynamic table

我正在尝试从动态网站上抓取表格(我相信它每 10 秒更新一次信息)并将其加载到熊猫数据框,但我似乎无法通过获取第一列的第一步。 有人可以建议我做错了什么吗? 谢谢。

# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd

urlpage = 'https://new.cryptoxscanner.com/binance/live'

driver = webdriver.Chrome(executable_path=r"C:\Users\xxxxx\Desktop\chrome\chromedriver.exe")

driver.get(urlpage)
time.sleep(10)
ticker = driver.find_element_by_xpath('//*[@id="scroll-source-1"]/table/tbody/tr[2]')

首先,您需要等到数据定位,使用.visibility_of_all_elements_located 您可以使用此定位器等待:

//table[contains(@class, "table-sm")]//a

找到所有数据后,就可以提取表数据了。 试试下面的代码:

driver.get('https://new.cryptoxscanner.com/binance/live')

#UPDATED HERE
option = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//select[contains(., "All")]'))))
option.select_by_visible_text('All')

WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[contains(@class, "table-sm")]//a')))
data = driver.find_element_by_class_name('table-responsive')
print(data.text)

以下导入:

#UPDATED HERE
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM