Web 在 python 中使用 Selenium 进行抓取 - 检索所有数据时遇到问题

Question

I am trying to webscrape coinmarketcap.com using selenium where I am trying to retrieve data such as coin name, coinmarket cap, price and circulation supply.我正在尝试使用 selenium 对 coinmarketcap.com 进行网络抓取，我正在尝试检索诸如硬币名称、硬币市场上限、价格和流通供应等数据。 However, I am not successful with this.但是，我在这方面并不成功。 I am only able to retrieve 11 alt coins and not more.我只能取回 11 个山寨币，不能更多。 Also, I have looked into several ways how to render javascrip (which I presume coinmarketcap is made in) using different methods.另外，我研究了几种如何使用不同的方法渲染 javascrip（我假设 coinmarketcap 是在其中制作的）的方法。 Here is the start of my code:这是我的代码的开始：

driver = webdriver.Chrome(r'C:\Users\Ejer\PycharmProjects\pythonProject\chromedriver')
driver.get('https://coinmarketcap.com/')

Crypto = driver.find_elements_by_xpath("//div[contains(concat(' ', normalize-space(@class), ' '), 'sc-16r8icm-0 sc-1teo54s-1 lgwUsc')]")
#price = driver.find_elements_by_xpath('//td[@class="cmc-link"]')
#coincap = driver.find_elements_by_xpath('//td[@class="DAY"]')

CMC_list = []
for c in range(len(Crypto)):
    CMC_list.append(Crypto[c].text)
print(CMC_list)

driver.close()

My goal is to store the names, coinmarket cap, price and circulation supply in a dataframe so I can apply machine learning methods and analyze the data.我的目标是将名称、硬币市值、价格和流通量存储在 dataframe 中，以便我可以应用机器学习方法并分析数据。 So, I am open to any suggestions.所以，我愿意接受任何建议。 Thank in advance预先感谢

Answer 1

To retrieve the list of coin names you need to close the cookies bar, close the popup and induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies :要检索硬币名称列表，您需要关闭cookies栏，关闭弹出窗口并为visibility_of_all_elements_located()诱导WebDriverWait ，您可以使用以下任一定位器策略：

Using CSS_SELECTOR and get_attribute("innerHTML") :使用CSS_SELECTOR和get_attribute("innerHTML") ：

 driver.get("https://coinmarketcap.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.cmc-cookie-policy-banner__close"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button/b[text()='No, thanks']"))).click() print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.cmc-table tbody tr td > ap[color='text']")))])

Using XPATH and text attribute:使用XPATH和文本属性：

 driver.get("https://coinmarketcap.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.cmc-cookie-policy-banner__close"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button/b[text()='No, thanks']"))).click() print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[contains(@class, 'cmc-table')]//tbody//tr//td/a//p[@color='text']")))]) driver.quit()

Console Output:控制台 Output：

 ['Bitcoin', 'Ethereum', 'XRP', 'Tether', 'Litecoin', 'Bitcoin Cash', 'Chainlink', 'Cardano', 'Polkadot', 'Binance Coin', 'Stellar', 'USD Coin', 'Bitcoin SV']

Note : You have to add the following imports:注意：您必须添加以下导入：

 from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

Answer 2

Facing the same problem, I added a page scrolling before Crypto = driver.find_elements_by_xpath... like this:面对同样的问题，我在 Crypto = driver.find_elements_by_xpath 之前添加了一个页面滚动...像这样：

i=0
while i<15:
  driver.execute_script("window.scrollBy(0, window.innerHeight)")
  time.sleep(SCROLL_PAUSE_TIME)
  i+=1
Crypto = driver.find_elements_by_xpath('//div[@class="sc-16r8icm-0 sc-1teo54s-0 dBKWCw"]')

On my laptop, scrolling down the page for 13 times is enough to get refreshed all 100 coins.在我的笔记本电脑上，向下滚动页面 13 次足以刷新所有 100 个硬币。 I put 15 just to be sure.为了确定，我放了 15 个。 The next step is to get the refreshed content.下一步是获取刷新的内容。 Perhaps I have to repeat scrolling every 1 or 2 minutes to get it.也许我必须每 1 或 2 分钟重复一次滚动才能获得它。 My first post here.我在这里的第一篇文章。 Hard enough to insert the code.很难插入代码。 I hope it's useful我希望它有用

Web 在 python 中使用 Selenium 进行抓取 - 检索所有数据时遇到问题

问题描述

2 个解决方案

解决方案1
0 2020-12-13 19:42:47

解决方案2
0 2021-11-30 00:03:55

Web 在 python 中使用 Selenium 进行抓取 - 检索所有数据时遇到问题

问题描述

2 个解决方案

解决方案1 0 2020-12-13 19:42:47

解决方案2 0 2021-11-30 00:03:55

解决方案1
0 2020-12-13 19:42:47

解决方案2
0 2021-11-30 00:03:55