繁体   English   中英

Python BeautifulSoup追加没有最外层标签的内容

[英]Python BeautifulSoup append contents without outermost tag

我会用一个“几乎我想要的”的例子来问我的问题:

from bs4 import BeautifulSoup

p = BeautifulSoup(features='lxml').new_tag('p')
p.append('This is my paragraph. I can add a ')
a = BeautifulSoup(features='lxml').new_tag('a', href='www.google.com')
a.string = 'link to Google'
p.append(a)
p.append(' and finish my paragraph.')

div = BeautifulSoup(features='lxml').new_tag('div')
div.append("I want to append the paragraph content into this div, but only its content without the <p> and </p>, and don't want to escape anything in the contents, i.e. I want to keep the a tag. ")
div.append(p)

print(div.prettify())

结果, print(div)显示

<div>
 I want to append the paragraph content into this div, but only its content without the &lt;p&gt; and &lt;/p&gt;, and don't want to escape anything in the contents, i.e. I want to keep the a tag.
 <p>
  This is my paragraph. I can add a
  <a href="www.google.com">
   link to Google
  </a>
  and finish my paragraph.
 </p>
</div>

正如示例中的文本本身所说,我想在没有<p></p>标签的情况下附加p的内部 HTML,但保留所有其他标签(在本例中为a标签)。 所以对于这个例子,这是我想要得到的结果:

<div>
 I want to append the paragraph content into this div, but only its content without the &lt;p&gt; and &lt;/p&gt;, and don't want to escape anything in the contents, i.e. I want to keep the a tag. This is my paragraph. I can add a
  <a href="www.google.com">
   link to Google
  </a>
  and finish my paragraph.
</div>

如何才能做到这一点? 我尝试了许多选项,如div.append(p.unwrap())div.append(p.text)以及其他一些没有运气的选项。 div.append(str(p)[3:-4])不起作用,因为它从内部元素中转义了所有<> ,在本例中为a

您可以像这样使用unwrap()来获得所需的结果。

import bs4 as bs

s = '''
<div>
 I want to append the paragraph content into this div, but only its content without the &lt;p&gt; and &lt;/p&gt;, and don't want to escape anything in the contents, i.e. I want to keep the a tag.
 <p>
  This is my paragraph. I can add a
  <a href="www.google.com">
   link to Google
  </a>
  and finish my paragraph.
 </p>
</div>
'''
soup = bs.BeautifulSoup(s, 'html.parser')

div_tag = soup.find('div')
div_tag.p.unwrap()

print(soup)

Output

<div>
 I want to append the paragraph content into this div, but only its content without the &lt;p&gt; and &lt;/p&gt;, and don't want to escape anything in the contents, i.e. I want to keep the a tag.
 
  This is my paragraph. I can add a
  <a href="www.google.com">
   link to Google
  </a>
  and finish my paragraph.
 
</div>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM