简体   繁体   English

在python中的xml标记之间提取文本

[英]extract text between xml tags in python

I have xml string below and trying to print text between tags domain, receive_time , serial and seqno for each entry tag. 我在下面有xml字符串,并尝试在每个条目标签的标签域,receive_time,serial和seqno之间打印文本。

xml="""
<response status="success" code="19"><result><msg><line>query job enqueued with jobid 19032</line></msg><job>19032</job></result></response>
19032
<response status="success"><result>
  <job>
    <tenq>14:10:09</tenq>
    <tdeq>14:10:09</tdeq>
    <tlast>19:00:00</tlast>
    <status>ACT</status>
    <id>19032</id>
    <cached-logs>64</cached-logs>
  </job>
  <log>
    <logs count="20" progress="29">
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      </logs>
  </log>
</result></response>
"""

using xml.etree.ElementTree. 使用xml.etree.ElementTree。 To get what's between domain tag I was trying node.attrib.get('domain') or node.get('domain') ..please advise 要获取标记之间的内容我正在尝试node.attrib.get('domain')node.get('domain') ..请提供建议

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
        print node

It can be other python library too, does not have to be xml.etree. 它也可以是其他python库,不必是xml.etree。 I do not want to print text between tags blindly, I need to print tag name followed by text so ie: 我不想盲目地在标签之间打印文本,我需要打印标签名称后跟文本,即:

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

etc

You find the domain tag using the find() method first. 您首先使用find()方法找到domain标记。 Then, the tag attribute and the text attribute should fetch the details you are looking for - 然后, tag属性和text属性应该获取您要查找的详细信息 -

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
    print('\n')
    for elem in node.iter():
        if not elem.tag==node.tag:
            print("{}: {}".format(elem.tag, elem.text))

Hope this helps! 希望这可以帮助!

Output - 输出 -

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120


domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

You can use SAX Streams to get the inner text content of the xml element. 您可以使用SAX Streams获取xml元素的内部文本内容。 SAX is the better way to parse xml without reading the whole XML into the memory aka DOM Python SAX SAX是解析xml而不将整个XML读入内存(即DOM Python SAX )的更好方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM