在嵌套 HTML 中访问美丽的汤元素

Question

I wish to extract the director & actor elements from this parsed html output of IMDB top 250 page.我希望从 IMDB 前 250 页的这个解析的 html 输出中提取导演和演员元素。 How should the python one liner for it look like? python one liner for it应该是什么样子的？ The "text-muted text-small" appears multiple times, and find_all does not seem to be the optimum way to go about it. “text-muted text-small”出现多次，而 find_all 似乎不是解决它的最佳方法。

<span class="ipl-rating-selector__rating-value">0</span>
</div>
<div class="ipl-rating-selector__error ipl-rating-selector__wrapper">
<span>Error: please try again.</span>
</div>
</div>
<div class="ipl-rating-interactive__loader">
<img alt="loading" src="https://m.media-amazon.com/images/G/01/IMDb/spinning-progress.gif"/>
</div>
</div>
</div>
<div class="inline-block ratings-metascore">
<span class="metascore favorable">80        </span>
        Metascore
        </div>
<p class="">
    Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.</p>
<p class="text-muted text-small">
    Director:
<a href="/name/nm0001104/">Frank Darabont</a>
<span class="ghost">|</span> 
    Stars:
<a href="/name/nm0000209/">Tim Robbins</a>, 
<a href="/name/nm0000151/">Morgan Freeman</a>, 
<a href="/name/nm0348409/">Bob Gunton</a>, 
<a href="/name/nm0006669/">William Sadler</a>
</p>
<p class="text-muted text-small">
<span class="text-muted">Votes:</span>
<span data-value="2187696" name="nv">2,187,696</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span data-value="28,341,469" name="nv">$28.34M</span>
</p>
<div class="wtw-option-standalone" data-baseref="wl_li" data-tconst="tt0111161" data-watchtype="minibar"></div>
</div>

Answer 1

If you are using BeautifulSoup 4.7.0 or higher, you can use the :contains CSS selector:如果您使用的是 BeautifulSoup 4.7.0 或更高版本，则可以使用:contains CSS 选择器：

soup = BeautifulSoup(your_html)
soup.select_one('p:contains("Director:","Stars:")')

Answer 2

This will select the containing p tag and iterate over it's children, printing out Directors and Actors separately:这将选择包含 p 标签并迭代它的孩子，分别打印出导演和演员：

director_and_stars_tag = soup.select_one('p:contains("Director:")')
directors_flag = True

for name_tag in director_and_stars_tag.findChildren():
    if directors_flag:
        # These are Director tags
        if ('span' in name_tag.name):
            directors_flag = False
        else:
            print('Director: %s' % name_tag.string)
    else:
        # These are Actor tags
        print('Actor: %s' % name_tag.string)

Output:输出：

Director: Frank Darabont
Actor: Tim Robbins
Actor: Morgan Freeman
Actor: Bob Gunton
Actor: William Sadler

Answer 3

If there's no id or class that you can use to identify those specific elements, You can simply iterate through your items and check if they contain what you're looking for.如果没有可用于标识这些特定元素的 id 或类，您可以简单地遍历您的项目并检查它们是否包含您要查找的内容。
A working example on your html sample would be您的 html 示例上的一个工作示例是

details = soup.find_all("p", attrs={"class": "text-muted text-small"})
for element in details:
    if "Stars" in element.text:
        stars = element.find_all("a")
        for star in stars:
            print(star.text)

在嵌套 HTML 中访问美丽的汤元素

问题描述

3 个解决方案

解决方案1
1 2020-02-08 18:02:14

解决方案2
1 已采纳 2020-02-08 18:06:57

解决方案3
0 2020-02-08 18:06:04

在嵌套 HTML 中访问美丽的汤元素

问题描述

3 个解决方案

解决方案1 1 2020-02-08 18:02:14

解决方案2 1 已采纳 2020-02-08 18:06:57

解决方案3 0 2020-02-08 18:06:04

解决方案1
1 2020-02-08 18:02:14

解决方案2
1 已采纳 2020-02-08 18:06:57

解决方案3
0 2020-02-08 18:06:04