简体   繁体   中英

How to find children element's children element using Beautiful Soup

I am a newbie to python . I want to use BeautifulSoup to get the post date in a forum. I tried many ways but unable to get the correct result.

Here is my problem:

<td class = by>
    <cite>...</cite>
    <em>
        <span>2015-11-13</span>
    </em>
    </td>
<td class = ...>...</td>
<td class = by>...</td>
    <cite>...</cite>
    <em><a>...</a></em>
    </td>

There are 2 classes with the same name " by " but I only want the date in the first with " span " tag.

Here is what I have tried but have no idea what's the problem:

cat=1
    for span in soup.findAll('span', {'class':"by"}):
        print (span.text)

A generic solution could be to iterate over <td> of class='by' and find <span> . from bs4 import BeautifulSoup

a="""<td class = by>
    <cite>...</cite>
    <em>
        <span>2015-11-13</span>
    </em>
    </td>
<td class = ...>...</td>
<td class = by>...</td>
    <cite>...</cite>
    <em><a>...</a></em>
    </td>"""

soup = BeautifulSoup(a, 'html.parser')
for item in soup.find_all("td",{"class": "by"}):
    for i in item.find_all("span"):
        print(i.text) # 2015-11-13

A more straightforward approach is

soup.select('td.by > em > span')[0].text # 2015-11-13

If you are only concerned with the first occurrence then as suggested by @Jon Clements you can use

soup.select_one('td.by > em > span').text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM