![](/img/trans.png)
[英]How to open each product within a website in a new tab for scraping using Selenium through Python
[英]Selenium Python, parsing through website, opening a new tab, and scraping
我是 Python 和 Selenium 的新手。 我正在尝试做一些事情——我确信我会以一种非常迂回的方式进行——任何帮助都非常感谢。
我试图解析的页面有不同的卡片需要点击,我需要转到每张卡片,然后从那里获取名称 (h1) 和 url。 我还没有走得很远,这就是我到目前为止所拥有的。
我浏览第一页,获取所有网址,将它们添加到列表中。 然后我想浏览列表,并转到每个 url(打开一个新选项卡)并从那里获取 h1 和 url。 似乎我什至无法抓住 h1,它会打开一个新选项卡,然后挂起,然后打开同一个选项卡。
先感谢您!
from selenium import webdriver from selenium.webdriver.common.keys import Keys import time driver = webdriver.Chrome() driver.get('https://zdb.pedaily.cn/enterprise//') #main URL title_links = driver.find_elements_by_css_selector('ul.n4 a') urls = [] #list of URLs # main = driver.find_elements_by_id('enterprise-list') for item in title_links: urls.append(item.get_attribute('href')) # print(urls) for url in urls: driver.execute_script("window.open('');") driver.switch_to.window(driver.window_handles[1]) driver.get(url) print(driver.find_element_by_css_selector('div.info h1'))
嗯,这里有几个问题:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
driver.get('https://zdb.pedaily.cn/enterprise/') # main URL
# Be much more specific or you'll get multiple returns of the same link
urls = driver.find_elements(By.TAG_NAME, 'ul.n4 li div.img a')
for url in urls:
# get href to print
print(url.get_attribute('href'))
# Inject JS to open new tab
driver.execute_script("window.open(arguments[0])", url)
# Switch focus to new tab
driver.switch_to.window(driver.window_handles[1])
# Make sure what we want has time to load and exists before trying to grab it
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.info h1')))
# Grab it and print it's contents
print(driver.find_element(By.CSS_SELECTOR, 'div.info h1').text)
# Uncomment the next line to do one tab at a time. Will reduce speed but not use so much ram.
#driver.close()
# Focus back on first window
driver.switch_to.window(driver.window_handles[0])
# Close window
driver.quit()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.