用Python在BeautifulSoup中进行内联解析

Question

I am writing an HTML document with BeautifulSoup, and I would like it to not split inline text (such as text within the  tag) into multiple lines. 我正在使用BeautifulSoup编写HTML文档，并且希望它不要将内联文本（例如标记内的文本）分成多行。 The issue that I get is that parsing the abc with prettify gives me the output 我得到的问题是使用prettify解析abc给我输出

<p>
  a
<span>
b
</span>
c
</p>

and now the HTML displays spaces between a,b,c, which I do not want. 现在HTML会在a，b，c之间显示空格，这是我不需要的。 How do I avoid this? 如何避免这种情况？

Answer 1

How about not using prettify at all? 完全不使用prettify怎么样？

BeautifulSoup.BeautifulSoup('<p>a<span>b</span>c</p>').renderContents()

outputs the original HTML with no extra spaces. 输出没有多余空格的原始HTML。 You can use eg Firebug to have a closer look at the document's structure later with no need to 'prettify' it at construction time. 您可以使用Firebug等工具在以后仔细查看文档的结构，而无需在构建时“美化”它。

Answer 2

I'd just do: 我会做：

from BeautifulSoup import BeautifulSoup

ht = '<p>a<span>b</span>c</p>'
soup = BeautifulSoup(ht)
print soup

and avoid getting any extra whitespace. 并避免获得任何多余的空格。 prettify 's job is exactly to adjust whitespace to clearly show the HTML parse tree's structure, after all...! prettify的工作就是调整空白以清楚地显示HTML解析树的结构……！

用Python在BeautifulSoup中进行内联解析

问题描述

2 个解决方案

解决方案1
2 已采纳 2010-01-22 23:16:15

解决方案2
0 2010-01-23 03:17:18

用Python在BeautifulSoup中进行内联解析

问题描述

2 个解决方案

解决方案1 2 已采纳 2010-01-22 23:16:15

解决方案2 0 2010-01-23 03:17:18

解决方案1
2 已采纳 2010-01-22 23:16:15

解决方案2
0 2010-01-23 03:17:18