简体   繁体   English

lxml(或lxml.html):打印树结构

[英]lxml (or lxml.html): print tree structure

I'd like to print out the tree structure of an etree (formed from an html document) in a differentiable way (means that two etrees should print out differently). 我想以可区分的方式打印出etree的树结构(由html文档构成)(意味着两个etree应该以不同的方式打印出来)。

What I mean by structure is the "shape" of the tree, which basically means all the tags but no attribute and no text content. 结构的意思是树的“形状”,它基本上意味着所有标签,但没有属性,也没有文本内容。

Any idea? 任何的想法? Is there something in lxml to do that? 在lxml中有什么东西可以做到这一点吗?

If not, I guess I have to iterate through the whole tree and construct a string from that. 如果没有,我想我必须遍历整个树并从中构造一个字符串。 Any idea how to represent the tree in a compact way? 知道如何以紧凑的方式表示树吗? (the "compact" feature is less relevant) (“紧凑”功能不太相关)

FYI it is not intended to be looked at, but to be stored and hashed to be able to make differences between several html templates. 仅供参考,不打算查看,而是存储和散列,以便能够在几个html模板之间产生差异。

Thanks 谢谢

Maybe just run some XSLT over the source XML to strip everything but the tags, it's then easy enough to use etree.tostring to get a string you could hash... 也许只是在源XML上运行一些XSLT去除除了标签之外的所有内容,然后使用etree.tostring来获取一个你可以散列的字符串就足够了......

from lxml import etree as ET

def pp(e):
    print ET.tostring(e, pretty_print=True)
    print

root = ET.XML("""\
<project id="8dce5d94-4273-47ef-8d1b-0c7882f91caa" kpf_version="4">
<livefolder id="8744bc67-1b9e-443d-ba9f-96e1d0007ba8" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8">Mooo</livefolder>
<livefolder id="8744bc67-1b9e-443d-ba9f" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8" />
<preference-set idref="8dce5d94-4273-47ef-8d1b-0c7882f91caa">
  <boolean id="import_live">0</boolean>
</preference-set>
</project>
""")
pp(root)


xslt = ET.XML("""\
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="*"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>
""")
tr = ET.XSLT(xslt)

doc2 = tr(root)
root2 = doc2.getroot()
pp(root2)

Gives you the output: 给你输出:

<project id="8dce5d94-4273-47ef-8d1b-0c7882f91caa" kpf_version="4">
  <livefolder id="8744bc67-1b9e-443d-ba9f-96e1d0007ba8" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8">Mooo</livefolder>
  <livefolder id="8744bc67-1b9e-443d-ba9f" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8"/>
  <preference-set idref="8dce5d94-4273-47ef-8d1b-0c7882f91caa">
    <boolean id="import_live">0</boolean>
  </preference-set>
</project>

<project>
  <livefolder/>
  <livefolder/>
  <preference-set>
    <boolean/>
  </preference-set>
</project>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM