I'd like to print out the tree structure of an etree (formed from an html document) in a differentiable way (means that two etrees should print out differently).
What I mean by structure is the "shape" of the tree, which basically means all the tags but no attribute and no text content.
Any idea? Is there something in lxml to do that?
If not, I guess I have to iterate through the whole tree and construct a string from that. Any idea how to represent the tree in a compact way? (the "compact" feature is less relevant)
FYI it is not intended to be looked at, but to be stored and hashed to be able to make differences between several html templates.
Thanks
Maybe just run some XSLT over the source XML to strip everything but the tags, it's then easy enough to use etree.tostring
to get a string you could hash...
from lxml import etree as ET
def pp(e):
print ET.tostring(e, pretty_print=True)
print
root = ET.XML("""\
<project id="8dce5d94-4273-47ef-8d1b-0c7882f91caa" kpf_version="4">
<livefolder id="8744bc67-1b9e-443d-ba9f-96e1d0007ba8" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8">Mooo</livefolder>
<livefolder id="8744bc67-1b9e-443d-ba9f" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8" />
<preference-set idref="8dce5d94-4273-47ef-8d1b-0c7882f91caa">
<boolean id="import_live">0</boolean>
</preference-set>
</project>
""")
pp(root)
xslt = ET.XML("""\
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
""")
tr = ET.XSLT(xslt)
doc2 = tr(root)
root2 = doc2.getroot()
pp(root2)
Gives you the output:
<project id="8dce5d94-4273-47ef-8d1b-0c7882f91caa" kpf_version="4">
<livefolder id="8744bc67-1b9e-443d-ba9f-96e1d0007ba8" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8">Mooo</livefolder>
<livefolder id="8744bc67-1b9e-443d-ba9f" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8"/>
<preference-set idref="8dce5d94-4273-47ef-8d1b-0c7882f91caa">
<boolean id="import_live">0</boolean>
</preference-set>
</project>
<project>
<livefolder/>
<livefolder/>
<preference-set>
<boolean/>
</preference-set>
</project>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.