簡體   English   中英

如何讓BeautifulSoup 4尊重自動關閉標簽?

[英]How to get BeautifulSoup 4 to respect a self-closing tag?

這個問題特定於BeautifulSoup4 ,這使得它與以前的問題不同:

為什么BeautifulSoup會修改我的自閉元素?

BeautifulSoup中的selfClosingTags

由於BeautifulStoneSoup已經消失(以前的xml解析器),我怎樣才能讓bs4尊重新的自閉標簽? 例如:

import bs4   
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])

print soup.prettify()

不會自動關閉bar標簽,但會給出提示。 bs4所指的這個樹構建器是什么以及如何自我關閉標記?

/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
  "BS4 does not respect the selfClosingTags argument to the "
<html>
 <body>
  <foo>
   <bar a="3">
   </bar>
  </foo>
 </body>
</html>

要解析XML,請將“xml”作為BeautifulSoup構造函數的第二個參數傳遞。

soup = bs4.BeautifulSoup(S, 'xml')

您需要安裝lxml。

您不再需要傳遞selfClosingTags

In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
 <bar a="3"/>
</foo>

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM