删除 HTML 标签（Python）

Question

I have the following HTML code我有以下 HTML 代码

<span><s>Something</s>Anything</span>

I would like to remove the span tag, returning the HTML code我想删除 span 标签，返回 HTML 代码

<s>Something></s>

I am using the beautifulsoup library我正在使用 beautifulsoup 库

soup = BeautifulSoup(HTML, 'lxml')
soup.span.unwrap()

But that returns -> <s>Something</s>Anything但这会返回 -> <s>Something</s>Anything

Answer 1

If all you want is the <s> part, why not just filter that out instead of removing?如果你想要的只是<s>部分，为什么不直接过滤掉而不是删除呢？

For example:例如：

from bs4 import BeautifulSoup

sample = """
<span><s>Something</s>Anything</span>
"""

soup = BeautifulSoup(sample, "lxml")
print(soup.find("s"))

This gets you:这让你：

<s>Something</s>

Should you have more of those  tags with <s> inside, you could go for something like this:如果您有更多带有<s>的标记，您可以使用 go 进行如下操作：

sample = """
<span><s>Something</s>Anything</span>
<span><s>More of Something</s>Less of Anything</span>
"""
print([t.find("s") for t in soup.find_all("span")])

To get this:要得到这个：

[<s>Something</s>, <s>More of Something</s>]

However, if you want to remove the tags, then you'll end up with an empty HTML (at least in this simple case).但是，如果你想删除标签，那么你最终会得到一个空的HTML （至少在这个简单的情况下）。

See this:看到这个：

from bs4 import BeautifulSoup

sample = """
<span><s>Something</s>Anything</span>
"""

soup = BeautifulSoup(sample, "lxml")

for tag in soup.find_all(True):
    if tag.name == "span":
        tag.extract()
print(soup)

Produces this:产生这个：

<html><head></head><body>
</body></html>

Or, shorter, with a list comprehension:或者，更短的，使用列表理解：

print([t.extract() for t in soup.find_all("span")])

Gives: []给出： []

So, I guess, your best bet is to filter the unwanted tags out.所以，我想，你最好的选择是过滤掉不需要的标签。

Answer 2

I tried the following code:我尝试了以下代码：

s1 = soup.span.s
soup.span.replaceWith(s1)
print(soup)

Output: Output：

<html><body><s>Something</s></body></html>

Answer 3

you want <s> tags or innerHTML of  ?你想要<s>标签或的 innerHTML 吗？

first answer give you a code to get <s> tags or <s>Something></s>第一个答案给你一个代码来获取<s>标签或<s>Something></s>

To get innerHTML of  or value <s>Something</s>Anything use要获取的 innerHTML 或值<s>Something</s>Anything使用

spanTag.decode_contents()

Answer 4

from bs4 import BeautifulSoup

with open('home.html', 'r') as html_file:
   content = html_file.read()

   soup = BeautifulSoup(content, 'lxml')
   print(soup.prettify()) # this part makes the output look better

删除 HTML 标签（Python）

问题描述

4 个解决方案

解决方案1
1 2020-12-20 09:50:37

解决方案2
0 已采纳 2020-12-20 20:18:51

解决方案3
0 2020-12-21 05:39:36

解决方案4
-1 2020-12-21 06:38:29

删除 HTML 标签（Python）

问题描述

4 个解决方案

解决方案1 1 2020-12-20 09:50:37

解决方案2 0 已采纳 2020-12-20 20:18:51

解决方案3 0 2020-12-21 05:39:36

解决方案4 -1 2020-12-21 06:38:29

解决方案1
1 2020-12-20 09:50:37

解决方案2
0 已采纳 2020-12-20 20:18:51

解决方案3
0 2020-12-21 05:39:36

解决方案4
-1 2020-12-21 06:38:29