简体   繁体   English

用Python在BeautifulSoup中进行内联解析

[英]Inline parsing in BeautifulSoup in Python

I am writing an HTML document with BeautifulSoup, and I would like it to not split inline text (such as text within the <p> tag) into multiple lines. 我正在使用BeautifulSoup编写HTML文档,并且希望它不要将内联文本(例如<p>标记内的文本)分成多行。 The issue that I get is that parsing the <p>a<span>b</span>c</p> with prettify gives me the output 我得到的问题是使用prettify解析<p>a<span>b</span>c</p>给我输出

<p>
  a
<span>
b
</span>
c
</p>

and now the HTML displays spaces between a,b,c, which I do not want. 现在HTML会在a,b,c之间显示空格,这是我不需要的。 How do I avoid this? 如何避免这种情况?

How about not using prettify at all? 完全不使用prettify怎么样?

BeautifulSoup.BeautifulSoup('<p>a<span>b</span>c</p>').renderContents()

outputs the original HTML with no extra spaces. 输出没有多余空格的原始HTML。 You can use eg Firebug to have a closer look at the document's structure later with no need to 'prettify' it at construction time. 您可以使用Firebug等工具在以后仔细查看文档的结构,而无需在构建时“美化”它。

I'd just do: 我会做:

from BeautifulSoup import BeautifulSoup

ht = '<p>a<span>b</span>c</p>'
soup = BeautifulSoup(ht)
print soup

and avoid getting any extra whitespace. 并避免获得任何多余的空格。 prettify 's job is exactly to adjust whitespace to clearly show the HTML parse tree's structure, after all...! prettify的工作就是调整空白以清楚地显示HTML解析树的结构……!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM