使用 Selenium/BeautifulSoup 在 HTML 个元素中查找模式

Question

I have this html that I am trying to scrape from surfing heats我有这个 html，我正试图从冲浪热中刮掉

     <div class="event-round">
           <div id="heat-85940" class="new-heat  new-heat--status-completed new-heat--athletes-4">
           <div id="heat-85941" class="new-heat  new-heat--status-completed new-heat--athletes-4">
           <div id="heat-85942" class="new-heat  new-heat--status-completed new-heat--athletes-4">
           <div id="heat-85943" class="new-heat  new-heat--status-completed new-heat--athletes-4">
     <div>

I have a loop created to scrape the heats on each page, but because the heat ID is changing on each page (ie not always starting at 85940), I can only get 1 page worth without manually changing the range i loop.我创建了一个循环来收集每一页上的热度，但是因为热度 ID 在每一页上都在变化（即并不总是从 85940 开始），我只能获得 1 页的价值，而无需手动更改我循环的范围。

For one page, my code looks like this:对于一页，我的代码如下所示：

heat_count = len(driver.find_elements(By.CLASS_NAME, 'new-heat-hd-name').text)

for h in range(heat_count):
    for i in range(4):
        name = driver.find_element(By.XPATH, f'//*[@id="heat-8594{h}"]/div/div[2]/div[{i + 1}]/div[1]/div[1]/div/div[2]/div[1]/span').text

I'm looking for a way to search within the html to find heat-85940 and then start from there instead of manually finding it for each page.我正在寻找一种在 html 中搜索以查找heat-85940 ，然后从那里开始，而不是为每个页面手动查找它。

Answer 1

You can try this:你可以试试这个：

Here I am writing only the starting section of the XPath - ie, how to handle the dynamic value 'id="heat-85940"', pls fill in the remaining XPath, because you didn't post the URL and full HTML source.这里我只写XPath的开始部分——即如何处理动态值'id="heat-85940"'，请填写剩余的XPath，因为你没有发布URL和完整的HTML源。

driver.find_element(By.XPATH, ".//*[starts-with(@id,'heat-')]...<remaining XPath until the element>")

or要么

driver.find_element(By.XPATH, ".//*[starts-with(@id,'heat-8594')]...<remaining XPath until the element>")

Answer 2

You can try something like你可以尝试类似的东西

# heats = driver.find_elements(By.XPATH, '//*[starts-with(@id,"heat-")]')
heats = driver.find_elements(By.CSS_SELECTOR, '*[id^="heat-"]')
for heat in heats:
    names = heat.find_elements(By.XPATH, '/div/div[2]/div/div[1]/div[1]/div/div[2]/div[1]/span')
    for n in names[:4]:
        name = n.text

or要么

# heats = driver.find_elements(By.XPATH, '//*[starts-with(@id,"heat-")]')
heats = driver.find_elements(By.CSS_SELECTOR, '*[id^="heat-"]')
for heat in heats:
    for i in range(4):
        name = heat.find_element(By.XPATH, f'/div/div[2]/div[{i + 1}]/div[1]/div[1]/div/div[2]/div[1]/span').text

(I can't test these without more of your html so I'm not quite confident in any of them.) （如果没有更多您的 html，我无法测试这些，所以我对它们中的任何一个都不太有信心。）

使用 Selenium/BeautifulSoup 在 HTML 个元素中查找模式

问题描述

2 个解决方案

解决方案1
0 2022-12-07 02:11:16

解决方案2
0 2022-12-07 03:43:39

使用 Selenium/BeautifulSoup 在 HTML 个元素中查找模式

问题描述

2 个解决方案

解决方案1 0 2022-12-07 02:11:16

解决方案2 0 2022-12-07 03:43:39

解决方案1
0 2022-12-07 02:11:16

解决方案2
0 2022-12-07 03:43:39