简体   繁体   English

如何使用lxml和python来打印xml文件的子树?

[英]How to use lxml and python to pretty print a subtree of an xml file?

I have the following code using python with lxml to pretty print the file example.xml : 我有以下代码使用pythonlxml来打印文件example.xml

python -c '
from lxml import etree;
from sys import stdout, stdin;

parser=etree.XMLParser(remove_blank_text=True, strip_cdata=False);
tree=etree.parse(stdin, parser)
tree.write(stdout, pretty_print = True)' < example.xml

I'm using lxml because it is important that I preserve the fidelity of the original file, including preserving the CDATA idioms. 我正在使用lxml,因为保持原始文件的保真度非常重要,包括保留CDATA惯用语。 Here's the file example.xml that I'm using it on: 这是我正在使用它的文件example.xml

<projects><project name="helloworld" threads="1" pubsub="auto" heartbeat-interval="1">
<description><![CDATA[This is a sample project]]></description>  <metadata>    <meta id="studioUploadedBy">anonymous</meta>
<meta id="studioUploaded">1550863090439</meta>    <meta id="studioModifiedBy">anonymous</meta>
<meta id="studioModified">1550863175384</meta>    <meta id="studioTags">helloworld</meta>
<meta id="studioVersionNotes">This is just a sample project</meta>    <meta id="layout">{"cq1":{"Source1":{"x":50,"y":-290}}}</meta>
</metadata>  <contqueries>    <contquery name="cq1">      <windows>        <window-source pubsub="true" name="Source1">
<schema>            <fields>              <field name="name" type="string" key="true"/>            </fields>
</schema>        </window-source>      </windows>    </contquery>  </contqueries> </project></projects>

It generates the following output: 它生成以下输出:

<projects>
  <project name="helloworld" threads="1" pubsub="auto" heartbeat-interval="1">
    <description><![CDATA[This is a sample project]]></description>
    <metadata>
      <meta id="studioUploadedBy">anonymous</meta>
      <meta id="studioUploaded">1550863090439</meta>
      <meta id="studioModifiedBy">anonymous</meta>
      <meta id="studioModified">1550863175384</meta>
      <meta id="studioTags">helloworld</meta>
      <meta id="studioVersionNotes">This is just a sample project</meta>
      <meta id="layout">{"cq1":{"Source1":{"x":50,"y":-290}}}</meta>
    </metadata>
    <contqueries>
      <contquery name="cq1">
        <windows>
          <window-source pubsub="true" name="Source1">
            <schema>
              <fields>
                <field name="name" type="string" key="true"/>
              </fields>
            </schema>
          </window-source>
        </windows>
      </contquery>
    </contqueries>
  </project>
</projects>

This is nearly what I want except that I'd like to get a subtree. 这几乎是我想要的,除了我想得到一个子树。 I'd like to be able to get just the subtree <project name="helloworld"...> thru </project> . 我希望能够得到子树<project name="helloworld"...> thru </project> How would I modify the above Python code based on lxml to do that? 我如何修改基于lxml的上述Python代码来做到这一点?

We can capture a nested Element using xpath . 我们可以使用xpath捕获嵌套的Element。 Element objects do not provide the same .write() capability so we'll need to a different output mechanism. 元素对象不提供相同的.write()功能,因此我们需要一个不同的输出机制。

How about... 怎么样...

python -c '
from lxml import etree;
from sys import stdout, stdin;

parser=etree.XMLParser(remove_blank_text=True, strip_cdata=False);
tree=etree.parse(stdin, parser)
# assuming there will be exactly 1 project
project=tree.xpath("project")[0]
print etree.tostring(project, pretty_print = True)' < example.xml

You can use tree.find to get the xml element you need extracted. 您可以使用tree.find来获取需要提取的xml元素。 Them convert it to element tree. 他们将其转换为元素树。 Then you can issue a write statement on the resulting elementtree (et) in this case. 然后,您可以在这种情况下对结果elementtree(et)发出write语句。

python -c '
           from lxml import etree;
           from sys import stdout, stdin;
           parser=etree.XMLParser(remove_blank_text=True,strip_cdata=False);
           tree=etree.parse(stdin, parser)
           e = tree.find("project")
           et = etree.ElementTree(e)                                                                                                                                                                             
           et.write(stdout, pretty_print = True)'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM