[英]Python BeautifulSoup append contents without outermost tag
我会用一个“几乎我想要的”的例子来问我的问题:
from bs4 import BeautifulSoup
p = BeautifulSoup(features='lxml').new_tag('p')
p.append('This is my paragraph. I can add a ')
a = BeautifulSoup(features='lxml').new_tag('a', href='www.google.com')
a.string = 'link to Google'
p.append(a)
p.append(' and finish my paragraph.')
div = BeautifulSoup(features='lxml').new_tag('div')
div.append("I want to append the paragraph content into this div, but only its content without the <p> and </p>, and don't want to escape anything in the contents, i.e. I want to keep the a tag. ")
div.append(p)
print(div.prettify())
结果, print(div)
显示
<div>
I want to append the paragraph content into this div, but only its content without the <p> and </p>, and don't want to escape anything in the contents, i.e. I want to keep the a tag.
<p>
This is my paragraph. I can add a
<a href="www.google.com">
link to Google
</a>
and finish my paragraph.
</p>
</div>
正如示例中的文本本身所说,我想在没有<p>
和</p>
标签的情况下附加p
的内部 HTML,但保留所有其他标签(在本例中为a
标签)。 所以对于这个例子,这是我想要得到的结果:
<div>
I want to append the paragraph content into this div, but only its content without the <p> and </p>, and don't want to escape anything in the contents, i.e. I want to keep the a tag. This is my paragraph. I can add a
<a href="www.google.com">
link to Google
</a>
and finish my paragraph.
</div>
如何才能做到这一点? 我尝试了许多选项,如div.append(p.unwrap())
或div.append(p.text)
以及其他一些没有运气的选项。 div.append(str(p)[3:-4])
不起作用,因为它从内部元素中转义了所有<
和>
,在本例中为a
。
您可以像这样使用unwrap()
来获得所需的结果。
import bs4 as bs
s = '''
<div>
I want to append the paragraph content into this div, but only its content without the <p> and </p>, and don't want to escape anything in the contents, i.e. I want to keep the a tag.
<p>
This is my paragraph. I can add a
<a href="www.google.com">
link to Google
</a>
and finish my paragraph.
</p>
</div>
'''
soup = bs.BeautifulSoup(s, 'html.parser')
div_tag = soup.find('div')
div_tag.p.unwrap()
print(soup)
Output
<div>
I want to append the paragraph content into this div, but only its content without the <p> and </p>, and don't want to escape anything in the contents, i.e. I want to keep the a tag.
This is my paragraph. I can add a
<a href="www.google.com">
link to Google
</a>
and finish my paragraph.
</div>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.