[英]Inline parsing in BeautifulSoup in Python
I am writing an HTML document with BeautifulSoup, and I would like it to not split inline text (such as text within the <p>
tag) into multiple lines. 我正在使用BeautifulSoup编写HTML文档,并且希望它不要将内联文本(例如<p>
标记内的文本)分成多行。 The issue that I get is that parsing the <p>a<span>b</span>c</p>
with prettify gives me the output 我得到的问题是使用prettify解析<p>a<span>b</span>c</p>
给我输出
<p>
a
<span>
b
</span>
c
</p>
and now the HTML displays spaces between a,b,c, which I do not want. 现在HTML会在a,b,c之间显示空格,这是我不需要的。 How do I avoid this? 如何避免这种情况?
How about not using prettify
at all? 完全不使用prettify
怎么样?
BeautifulSoup.BeautifulSoup('<p>a<span>b</span>c</p>').renderContents()
outputs the original HTML with no extra spaces. 输出没有多余空格的原始HTML。 You can use eg Firebug to have a closer look at the document's structure later with no need to 'prettify' it at construction time. 您可以使用Firebug等工具在以后仔细查看文档的结构,而无需在构建时“美化”它。
I'd just do: 我会做:
from BeautifulSoup import BeautifulSoup
ht = '<p>a<span>b</span>c</p>'
soup = BeautifulSoup(ht)
print soup
and avoid getting any extra whitespace. 并避免获得任何多余的空格。 prettify
's job is exactly to adjust whitespace to clearly show the HTML parse tree's structure, after all...! prettify
的工作就是调整空白以清楚地显示HTML解析树的结构……!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.