如何使用 BeautifulSoup 获取某个 class 下的所有标签（带内容）？

Question

I have a class in my soup element that is the description of a unit.我的汤元素中有一个 class ，它是一个单位的描述。

<div class="ats-description">
 <p>Here is a paragraph</p>
 <div>inner div</div>
 <div>Another div</div>
 <ul>
    <li>Item1</li>
    <li>Item2</li>
    <li>Item3</li>
 </ul>
</div>

I can easily grab this part with soup.select(".ats-description")[0] .我可以用soup.select(".ats-description")[0]轻松抓住这部分。 Now I want to remove <div class="ats-description"> , only to keep all the inner tags (to retain text structure).现在我想删除<div class="ats-description"> ，只保留所有内部标签（保留文本结构）。 How to do it?怎么做？

soup.select(".ats-description")[0].getText() gives me all the texts within, like this: soup.select(".ats-description")[0].getText()给了我里面的所有文本，像这样：

'\nHere is a paragraph\ninner div\nAnother div\n\nItem1\nItem2\nItem3\n\n\n'

But removes all the inner tags, so it's just unstructured text.但是删除了所有内部标签，所以它只是非结构化文本。 I want to keep the tags as well.我也想保留标签。

Answer 1

to get innerHTML, use method .decode_contents()要获取 innerHTML，请使用方法.decode_contents()

innerHTML = soup.select_one('.ats-description').decode_contents()
print(innerHTML)

Answer 2

Try this, match by tag in list in soup.find_all()试试这个，在soup.find_all()的列表中按标签匹配

from bs4 import BeautifulSoup

html="""<div class="ats-description">
 <p>Here is a paragraph</p>
 <div>inner div</div>
 <div>Another div</div>
 <ul>
    <li>Item1</li>
    <li>Item2</li>
    <li>Item3</li>
 </ul>
</div>"""

soup = BeautifulSoup(html, 'lxml')
print(soup.select_one("div.ats-description").find_all(['p','div','ul']))

如何使用 BeautifulSoup 获取某个 class 下的所有标签（带内容）？

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-17 07:37:59

解决方案2
0 2021-02-17 06:30:00

如何使用 BeautifulSoup 获取某个 class 下的所有标签（带内容）？

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-17 07:37:59

解决方案2 0 2021-02-17 06:30:00

解决方案1
1 已采纳 2021-02-17 07:37:59

解决方案2
0 2021-02-17 06:30:00