简体   繁体   English

Web 使用 Selenium 刮取 python - 不检索所有元素

[英]Web scraping using Selenium using python - not retrieving all elements

I am trying to web scrape coinmarketcap.com using Selenium, but I can only retrieve the first 10 altcoins on the list.我正在尝试使用 Selenium 来 web 刮 coinmarketcap.com,但我只能检索列表中的前 10 个山寨币。 I read that //div[contains(concat(' ', normalize-space(@class), ' '), 'class name')] should do the trick, but it is not working.我读到 //div[contains(concat(' ', normalize-space(@class), ' '), 'class name')] 应该可以解决问题,但它不起作用。 Can someone help me?有人能帮我吗? I am also aware that coinmarketcap as an api, but I just wanted to try another way.我也知道 coinmarketcap 是 api,但我只是想尝试另一种方式。


driver = webdriver.Chrome(r'C:\Users\Ejer\PycharmProjects\pythonProject\chromedriver')
driver.get('https://coinmarketcap.com/')

Crypto = driver.find_elements_by_xpath("//div[contains(concat(' ', normalize-space(@class), ' '), 'sc-16r8icm-0 sc-1teo54s-1 lgwUsc')]")
#price = driver.find_elements_by_xpath('//td[@class="cmc-link"]')
#coincap = driver.find_elements_by_xpath('//td[@class="DAY"]')

CMC_list = []
for c in range(len(Crypto)):
    CMC_list.append(Crypto[c].text)
print(CMC_list)

driver.close()

To retrieve the first 10 altcoins on the list you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies :要检索列表中的前 10 个山寨币,您需要为visibility_of_all_elements_located()引入WebDriverWait ,您可以使用以下任一定位器策略

  • Using CSS_SELECTOR and get_attribute("innerHTML") :使用CSS_SELECTORget_attribute("innerHTML")

     driver.get('https://coinmarketcap.com/') print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.cmc-table tbody tr td > ap[color='text']")))[:10]])
  • Using XPATH and text attribute:使用XPATH文本属性:

     driver.get('https://coinmarketcap.com/') print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[contains(@class, 'cmc-table')]//tbody//tr//td/a//p[@color='text']")))[:10]])
  • Console Output:控制台 Output:

     ['Bitcoin', 'Ethereum', 'XRP', 'Tether', 'Litecoin', 'Bitcoin Cash', 'Chainlink', 'Cardano', 'Polkadot', 'Binance Coin']
  • Note : You have to add the following imports:注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM