简体   繁体   English

如何使用 BeautifulSoup 获取某个 class 下的所有标签(带内容)?

[英]How to get all the tags (with content) under a certain class with BeautifulSoup?

I have a class in my soup element that is the description of a unit.我的汤元素中有一个 class ,它是一个单位的描述。

<div class="ats-description">
 <p>Here is a paragraph</p>
 <div>inner div</div>
 <div>Another div</div>
 <ul>
    <li>Item1</li>
    <li>Item2</li>
    <li>Item3</li>
 </ul>
</div>

I can easily grab this part with soup.select(".ats-description")[0] .我可以用soup.select(".ats-description")[0]轻松抓住这部分。 Now I want to remove <div class="ats-description"> , only to keep all the inner tags (to retain text structure).现在我想删除<div class="ats-description"> ,只保留所有内部标签(保留文本结构)。 How to do it?怎么做?

soup.select(".ats-description")[0].getText() gives me all the texts within, like this: soup.select(".ats-description")[0].getText()给了我里面的所有文本,像这样:

'\nHere is a paragraph\ninner div\nAnother div\n\nItem1\nItem2\nItem3\n\n\n'

But removes all the inner tags, so it's just unstructured text.但是删除了所有内部标签,所以它只是非结构化文本。 I want to keep the tags as well.我也想保留标签。

to get innerHTML, use method .decode_contents()要获取 innerHTML,请使用方法.decode_contents()

innerHTML = soup.select_one('.ats-description').decode_contents()
print(innerHTML)

Try this, match by tag in list in soup.find_all()试试这个,在soup.find_all()的列表中按标签匹配

from bs4 import BeautifulSoup

html="""<div class="ats-description">
 <p>Here is a paragraph</p>
 <div>inner div</div>
 <div>Another div</div>
 <ul>
    <li>Item1</li>
    <li>Item2</li>
    <li>Item3</li>
 </ul>
</div>"""

soup = BeautifulSoup(html, 'lxml')
print(soup.select_one("div.ats-description").find_all(['p','div','ul']))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python BeautifulSoup在某种标签类型下查找所有标签 - Python BeautifulSoup find all tags under a certain type of tag Beautifulsoup - 如何获取某个类中某个类的所有链接? - Beautifulsoup - How to get all links inside a block with a certain class? 如何在beautifulsoup中按顺序获取打印标签内容? - How to get print the content of tags in order in beautifulsoup? 如何在beautifulsoup中的某个文本之后删除标签和内容 - How to remove tags and content after a certain text in beautifulsoup python-如何获得所有 <p> 在带有beautifulsoup的网页中某个文本之前添加标签? - python - How to get all the <p> tags before a certain text in a webpage with beautifulsoup? BeautifulSoup - 如何获取某个属性的所有值 - BeautifulSoup - How to get all the values of a certain attribute 如何提取所有嵌套<option value="tags and their content with BeautifulSoupBeautifulSoup?">标签及其内容与 BeautifulSoup?</option> - How to extract all nested <option> tags and their content with BeautifulSoup? 如何使用BeautifulSoup类获取div的内容? - How to get the content of a div with class using BeautifulSoup? BeautifulSoup获取所有字符串标签 - BeautifulSoup get all tags of strings 如何使用 python 在 beautifulsoup 中获取标签内容并在一行中打印? - How to get content of tags and print in one line in beautifulsoup with python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM