如何使用Beautiful Soup查找children元素的children元素

Question

I am a newbie to python . 我是python的新手。 I want to use BeautifulSoup to get the post date in a forum. 我想使用BeautifulSoup在论坛中获取发布日期。 I tried many ways but unable to get the correct result. 我尝试了许多方法，但无法获得正确的结果。

Here is my problem: 这是我的问题：

<td class = by>
    <cite>...</cite>
    <em>
        <span>2015-11-13</span>
    </em>
    </td>
<td class = ...>...</td>
<td class = by>...</td>
    <cite>...</cite>
    <em><a>...</a></em>
    </td>

There are 2 classes with the same name " by " but I only want the date in the first with " span " tag. 有2个名称相同的类“ by ”，但我只希望第一个带有“ span ”标签的日期。

Here is what I have tried but have no idea what's the problem: 这是我尝试过的，但不知道出了什么问题：

cat=1
    for span in soup.findAll('span', {'class':"by"}):
        print (span.text)

Answer 1

A generic solution could be to iterate over <td> of class='by' and find <span> . 通用的解决方案可能是遍历class='by' <td>并找到<span> 。 from bs4 import BeautifulSoup 从bs4导入BeautifulSoup

a="""<td class = by>
    <cite>...</cite>
    <em>
        <span>2015-11-13</span>
    </em>
    </td>
<td class = ...>...</td>
<td class = by>...</td>
    <cite>...</cite>
    <em><a>...</a></em>
    </td>"""

soup = BeautifulSoup(a, 'html.parser')
for item in soup.find_all("td",{"class": "by"}):
    for i in item.find_all("span"):
        print(i.text) # 2015-11-13

A more straightforward approach is 一个更简单的方法是

soup.select('td.by > em > span')[0].text # 2015-11-13

If you are only concerned with the first occurrence then as suggested by @Jon Clements you can use 如果您只关心第一次出现的情况，则可以按照@Jon Clements的建议使用

soup.select_one('td.by > em > span').text

如何使用Beautiful Soup查找children元素的children元素

问题描述

1 个解决方案

解决方案1
1 2019-03-05 17:51:01

如何使用Beautiful Soup查找children元素的children元素

问题描述

1 个解决方案

解决方案1 1 2019-03-05 17:51:01

解决方案1
1 2019-03-05 17:51:01