简体   繁体   English

网页抓取#shadow-root

[英]Webscraping #shadow-root

I am attempting to webscrape data that is hidden within a #shadow-root (open).我正在尝试对隐藏在#shadow-root(打开)中的数据进行网络抓取。 I've managed to make progress, however, I am getting stuck at the end and was wondering if someone could help me finish it out.我已经设法取得了进展,但是,我在最后被卡住了,想知道是否有人可以帮助我完成它。

Code:代码:

from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd

options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument('disable-notifications')
options.add_argument("start-maximized")
options.add_experimental_option("detach", True)
browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
actions = ActionChains(browser)

url = "https://iltacon2022.expofp.com/?aceds-association-of-certified-e-discovery-specialists"

browser.get(url)
exhibitor_el = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, '//div[@data-event-id="iltacon2022"]/div')))
exhibitor_el_shadow_root = exhibitor_el.shadow_root
t.sleep(5)
companies_divv = exhibitor_el_shadow_root.find_element(By.CSS_SELECTOR, 'div[class="overlay-content exhibitor"]')
try:
    name = exhibitor_el_shadow_root.find_elements(By.CSS_SELECTOR, "div[class = 'exhibitor__bar']").text
except AttributeError:
    name = "Couldn't Find"
try:
    booth = exhibitor_el_shadow_root.find_elements(By.CSS_SELECTOR, "a[class = 'exhibitor__categories-booth']").text
except AttributeError:
    booth = "Couldn't Find"
try:
    url = exhibitor_el_shadow_root.find_elements(By.CSS_SELECTOR, "div[class = 'exhibitor__meta']").get_attribute('href')
except AttributeError:
    url = "Couldn't Find"
print(name)
print(booth)
print(url)

The output I am getting is "Couldn't Find", but I think that is just because I am misusing the CSS Selector or failed to get into the #shadow-root.我得到的 output 是“找不到”,但我认为这只是因为我滥用了 CSS 选择器或未能进入#shadow-root。

Desired output:所需的 output:

Name: ACEDS - Association of Certified E-Discovery Specialists
Booth: 828
Url: https://www.aceds.org/

I managed to get it to work with the following:我设法让它与以下工作:

Code:代码:

exhibitor_el = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, '//div[@data-event-id="iltacon2022"]/div')))
exhibitor_el_shadow_root = exhibitor_el.shadow_root
t.sleep(5)
companies_divv = exhibitor_el_shadow_root.find_element(By.CSS_SELECTOR, 'div[class="overlay-content exhibitor"]')
try:
    name = companies_divv.find_element(By.CSS_SELECTOR, "div[class = 'exhibitor__bar']").text
    #name = exhibitor_el_shadow_root.find_elements(By.CSS_SELECTOR, "div[class = 'exhibitor__bar']").text
except AttributeError:
    name = "Couldn't Find"
try:
    booth = companies_divv.find_element(By.CSS_SELECTOR, "a[class = 'exhibitor__categories-booth']").text
except AttributeError:
    booth = "Couldn't Find"
try:
    url = companies_divv.find_element(By.CSS_SELECTOR, "a[rel = 'noopener noreferrer']").get_attribute('href')
except AttributeError:
    url = "Couldn't Find"
print(name)
print(booth)
print(url)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何定位在 shadow-root 中找不到的元素 - How to locate element not found in shadow-root 有没有办法从网站上的shadow-root中提取信息? - Is there a way to extract information from shadow-root on a Website? 如何阅读#shadow-root(用户代理)下的文本 - How to read text that is under #shadow-root (user-agent) 如何使用 shadow-root 访问网站中的产品元素? - How do I acces the products element in a website with shadow-root? 无法使用 Selenium Python 将日期输入到带有 shadow-root(用户代理)的字段 - Cannot input date to field with shadow-root (user-agent) with Selenium Python Python Selenium can't find element by xpath within #shadow-root (open) using Selenium and Python - Python Selenium can't find element by xpath within #shadow-root (open) using Selenium and Python 将 Google Chrome 升级到 96 版后,shadow-root 元素搜索不起作用 - After upgrading to Google Chrome to version 96, shadow-root element search does not work 无法使用 Python Selenium 从 shadow-root 内的元素中提取文本 - Unable to pull text from elements within shadow-root using Python Selenium 无法使用 Python Selenium 在 shadow-root(打开)中定位元素 - Can't locate elments within shadow-root (open) using Python Selenium 如何使用 Selenium 在#shadow-root(打开)中切换到子框架 - How to switch to the child frame within #shadow-root (open) using Selenium
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM