简体   繁体   English

如何在python的硒中进行网页抓取时单击链接?

[英]How to click on a link while webscraping in python's selenium?

I am using selenium for python.我在 python 中使用硒。 When I am using the xpath to click on a link.当我使用 xpath 单击链接时。 I am getting an error TimeoutException: Message:.Ive tried using by.ID and by.tag but it seems like this link is hidden.我收到错误 TimeoutException: Message:.Ive 尝试使用 by.ID 和 by.tag 但似乎此链接已隐藏。 How can I click these two links.我怎样才能点击这两个链接。

here my code for the first link:这是我的第一个链接的代码:

btn = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"/html/body/div[3]/div/div/div[2]/div/div/div/div[2]/div/div/div[1]/div/div/div[2]/div/div/div")))
btn.click()


<div class="lib_33_IXqu lib_10OTPLG lib_rljvxxj lib_2bmVxh4 lib_AWe8PWK lib_NH5Lx3B lib_AWe8PWK"><div class="">Most Active<div class="lib_gdMpTuS lib_3Wb397t lib_QVji0M8 lib_1dwKEN3 lib_2IaUGOQ" aria-hidden="true">Most Active</div></div></div>

<div class="" data-selected="false"><div class="lib_33_IXqu lib_10OTPLG lib_rljvxxj lib_2bmVxh4 lib_AWe8PWK lib_NH5Lx3B lib_AWe8PWK"><div class="">Watchers<div class="lib_gdMpTuS lib_3Wb397t lib_QVji0M8 lib_1dwKEN3 lib_2IaUGOQ" aria-hidden="true">Watchers</div></div></div></div>

This code should work.这段代码应该可以工作。 Additionally, I noticed that there's an overlay that pops up after a few seconds that can interrupt your mouse clicks.此外,我注意到有一个覆盖会在几秒钟后弹出,可能会中断您的鼠标点击。 I've added a line of code to click out of it too.我也添加了一行代码来点击它。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time

# location of chromedriver.exe
driver = webdriver.Chrome("D:/chromedriver/94/chromedriver.exe")

driver.get("https://stocktwits.com/rankings/trending")

# waiting for the links to be available
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//div[@class="lib_33_IXqu lib_10OTPLG lib_rljvxxj lib_2bmVxh4 lib_AWe8PWK lib_NH5Lx3B lib_AWe8PWK"]')))

# capturing the links
links = driver.find_elements(By.XPATH, '//div[@class="lib_33_IXqu lib_10OTPLG lib_rljvxxj lib_2bmVxh4 lib_AWe8PWK lib_NH5Lx3B lib_AWe8PWK"]')

# get rid of the overlay message
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//button[@class="ub-emb-close"]'))).click()

# looping through the links since they all have the same class
for link in links:
    link.click()
    # do something
    time.sleep(2)

You could also access these links directly by their URLs:您还可以通过 URL 直接访问这些链接:

  • stocktwits.com/rankings/most-active stocktwits.com/rankings/most-active
  • stocktwits.com/rankings/watchers stocktwits.com/rankings/watchers

I see that the overlay pops up a few times after maybe a minute.我看到覆盖层在大约一分钟后弹出了几次。 You could use a function to create a script:您可以使用函数来创建脚本:

def close_overlay():
    return """
setInterval(()=>{{var overlay = document.querySelector('button[class="ub-emb-close"]');
if(overlay){{overlay.click();}} }}, 5000);
"""

and later, call this in your script somewhere like this:稍后,在您的脚本中这样调用它:

driver.execute_script(close_overlay())

This little script will check for the close button on that overlay every 5 seconds and closes it.这个小脚本将每 5 秒检查一次覆盖层上的关闭按钮并关闭它。

Note: This script could be trying to click on the close button at the same time your main bot is trying to click.注意:此脚本可能会在您的主机器人尝试单击的同时尝试单击关闭按钮。 This would lead to an ElementClickInterceptedException .这将导致ElementClickInterceptedException You could handle this exception in your code.您可以在代码中处理此异常。

This isn't required though, but something that might come in handy for you later.虽然这不是必需的,但稍后可能会派上用场。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM