Access Beautiful soup element in Nested HTML

Question

I wish to extract the director & actor elements from this parsed html output of IMDB top 250 page. How should the python one liner for it look like? The "text-muted text-small" appears multiple times, and find_all does not seem to be the optimum way to go about it.

<span class="ipl-rating-selector__rating-value">0</span>
</div>
<div class="ipl-rating-selector__error ipl-rating-selector__wrapper">
<span>Error: please try again.</span>
</div>
</div>
<div class="ipl-rating-interactive__loader">
<img alt="loading" src="https://m.media-amazon.com/images/G/01/IMDb/spinning-progress.gif"/>
</div>
</div>
</div>
<div class="inline-block ratings-metascore">
<span class="metascore favorable">80        </span>
        Metascore
        </div>
<p class="">
    Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.</p>
<p class="text-muted text-small">
    Director:
<a href="/name/nm0001104/">Frank Darabont</a>
<span class="ghost">|</span> 
    Stars:
<a href="/name/nm0000209/">Tim Robbins</a>, 
<a href="/name/nm0000151/">Morgan Freeman</a>, 
<a href="/name/nm0348409/">Bob Gunton</a>, 
<a href="/name/nm0006669/">William Sadler</a>
</p>
<p class="text-muted text-small">
<span class="text-muted">Votes:</span>
<span data-value="2187696" name="nv">2,187,696</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span data-value="28,341,469" name="nv">$28.34M</span>
</p>
<div class="wtw-option-standalone" data-baseref="wl_li" data-tconst="tt0111161" data-watchtype="minibar"></div>
</div>

Answer 1

If you are using BeautifulSoup 4.7.0 or higher, you can use the :contains CSS selector:

soup = BeautifulSoup(your_html)
soup.select_one('p:contains("Director:","Stars:")')

Answer 2

This will select the containing p tag and iterate over it's children, printing out Directors and Actors separately:

director_and_stars_tag = soup.select_one('p:contains("Director:")')
directors_flag = True

for name_tag in director_and_stars_tag.findChildren():
    if directors_flag:
        # These are Director tags
        if ('span' in name_tag.name):
            directors_flag = False
        else:
            print('Director: %s' % name_tag.string)
    else:
        # These are Actor tags
        print('Actor: %s' % name_tag.string)

Output:

Director: Frank Darabont
Actor: Tim Robbins
Actor: Morgan Freeman
Actor: Bob Gunton
Actor: William Sadler

Answer 3

If there's no id or class that you can use to identify those specific elements, You can simply iterate through your items and check if they contain what you're looking for.
A working example on your html sample would be

details = soup.find_all("p", attrs={"class": "text-muted text-small"})
for element in details:
    if "Stars" in element.text:
        stars = element.find_all("a")
        for star in stars:
            print(star.text)

Access Beautiful soup element in Nested HTML

Question

3 answers

solution1
1 2020-02-08 18:02:14

solution2
1 ACCPTED 2020-02-08 18:06:57

solution3
0 2020-02-08 18:06:04

Access Beautiful soup element in Nested HTML

Question

3 answers

solution1 1 2020-02-08 18:02:14

solution2 1 ACCPTED 2020-02-08 18:06:57

solution3 0 2020-02-08 18:06:04

solution1
1 2020-02-08 18:02:14

solution2
1 ACCPTED 2020-02-08 18:06:57

solution3
0 2020-02-08 18:06:04