Python Beautiful Soup 4 Get Children of Element with .select()

Question

The .select() element allows me to get an element off a web page based on a css selector, but this will search the whole web page. How would I use .select() but search only children of a specific element. Eg:

<!-- Simplified example of the structure -->
<ul>
    <li>
        <div class="foo">foo content</div>
        <div class="bar">bar content</div>
        <div class="baz">baz content</div>
    </li>
    <li>
        <!-- We can't assume that foo, bar, and baz will always be there -->
        <div class="foo">foo content</div>
        <div class="baz">baz content</div>
    </li>
    <li>
        <div class="foo">foo content</div>
        <div class="bar">bar content</div>
        <div class="baz">baz content</div>
    </li>
</ul>

I want a way to say: for <li> [0] foo contained the value "foo content" , bar contained the value "bar content" etc..

Currently my solution is the following:

foos = soup.select("div.foo")
bars = soup.select("div.bar")
bazs = soup.select("div.baz")

for i in range(len(foos)):
    print("{i} contains: {} and {} and {}".format(i=i, foos[i], bars[i], bazs[i]))

This works for the most part. But it completly falls apart when an element is missing from one of the li's. Like I showed in the HTML, we cannot assume that the three bar, baz and foo elements will be present.

Thus, how would I search only children of the lis. Thus I could do something like this:

for i in soup.select("li"):
    #how would i do this:
    foo = child_of("li", "div.foo")????
    bar = child_of("li", "div.bar")????
    baz = child_of("li", "div.baz")????

Answer 1

You can use element:nth-of-type(n) like so:

from bs4 import BeautifulSoup

a = """<!-- Simplified example of the structure -->
<ul>
    <li>
        <div class="foo">foo1 content</div>
        <div class="bar">bar1 content</div>
        <div class="baz">baz1 content</div>
    </li>
    <li>
        <!-- We can't assume that foo, bar, and baz will always be there -->
        <div class="foo">foo2 content</div>
        <div class="baz">baz2 content</div>
    </li>
    <li>
        <div class="foo">foo3 content</div>
        <div class="bar">bar3 content</div>
        <div class="baz">baz3 content</div>
    </li>
</ul>
"""

s = BeautifulSoup(a)
s2 = s.select('ul > li:nth-of-type(2)')[0]
foo, bar, baz = s2.select('div.foo'), s2.select('div.bar'), s2.select('div.baz')
print foo, bar, baz

Output:

[<div class="foo">foo2 content</div>] [] [<div class="baz">baz2 content</div>]

Answer 2

for li in soup.select('li'):
    foo = li.select('.foo')
    bar = li.select('.bar')
    baz = li.select('.baz')

each time you iterate over the li tag and use the select() , the html code to be selected is only the li tag's content, like:

<li>
    <div class="foo">foo content</div>
    <div class="bar">bar content</div>
    <div class="baz">baz content</div>
</li>

So, you can use select() to select li's child because li only contains the child tag.

Answer 3

This worked for me and all the foos, bars and bazs are being stored in separate lists

foos = []
bars = []
bazs = []
for i in soup.find_all('li'):
    soup2 = BeautifulSoup(str(i))
    print soup2
    for _ in soup2.find_all('div', {'class':'foo'}):
        foos.append(_)
    for _ in soup2.find_all('div', {'class': 'bar'}):
        bars.append(_)
    for _ in soup2.find_all('div', {'class': 'baz'}):
        bazs.append(_)

Python Beautiful Soup 4 Get Children of Element with .select()

Question

3 answers

solution1
1 ACCPTED 2017-01-10 06:11:42

solution2
0 2017-01-10 06:12:43

solution3
0 2017-01-10 07:30:00

Python Beautiful Soup 4 Get Children of Element with .select()

Question

3 answers

solution1 1 ACCPTED 2017-01-10 06:11:42

solution2 0 2017-01-10 06:12:43

solution3 0 2017-01-10 07:30:00

solution1
1 ACCPTED 2017-01-10 06:11:42

solution2
0 2017-01-10 06:12:43

solution3
0 2017-01-10 07:30:00