简体   繁体   English

使用python elementtree将xml解析为订单项

[英]parse xml with python elementtree into line items

Using Python and Elementtree, I'm having trouble parsing XML into text line items such that each line item represents one level only, no more, no less. 使用Python和Elementtree,我在将XML解析为文本行项目时遇到了麻烦,因此每个行项目只能代表一个级别,不能多也不能少。 Each line item will be eventually one record in a database such that the user can search on multiple terms within that field. 每个订单项最终都将是数据库中的一条记录,以便用户可以在该字段中搜索多个字词。 Sample XML: 样本XML:

?xml version="1.0" encoding="utf-8"?>
 <root>
    <mainTerm>
      <title>Meat</title>
      <see>protein</see>
    </mainTerm>
    <mainTerm>
      <title>Vegetables</title>
      <see>starch</see>
    </mainTerm>
    <mainTerm>
      <title>Fruit</nemod></title>
      <term level="1">
        <title>Apple</title>
        <code>apl</code>
      </term>
      <term level="1">
        <title>Red Delicious</title>
        <code>rd</code>
        <term level="2">
          <title>Large Red Delicious</title>
          <code>lrd</code>
        </term>
        <term level="2">
          <title>Medium Red Delicious</title>
          <code>mrd</code>
        </term>
        <term level="2">
          <title>Small Red Delicious</title>
          <code>mrd</code>
        </term>        
      <term level="1">
        <title>Grapes</title>
        <code>grp</code>
      </term>
      <term level="1">
        <title>Peaches</title>
        <code>pch</code>
      </term>      
    </mainTerm>
</root>

Desired Output: 所需输出:

Meat > protein
Vegetables > starch
Fruit > Apple > apl
Fruit > Apple > apl > Red Delicious > rd
Fruit > Apple > apl > Red Delicious > rd > Large Red Delicious > lrd
Fruit > Apple > apl > Red Delicious > rd > Medium Red Delicious > mrd
Fruit > Apple > apl > Red Delicious > rd > Small Red Delicious > srd
Fruit > Grapes > grp
Fruit > Peaches > pch

It's easy enough to use the tag 'mainTerm' to parse the XML, but the tricky part is limiting each line to only one level but at the same time including the upper level terms as well in the text. 使用标记“ mainTerm”来解析XML很容易,但是棘手的部分是将每一行限制为一个级别,但同时还要在文本中包括上层术语。 I'm basically trying to "flatten" the XML hierarchy by creating unique lines of text, each of which lists its parents (eg Fruit > Apple > apl) but not its siblings (eg Large Red Delicious, Medium Red Delicious, or Small Red Delicious). 我基本上是试图通过创建唯一的文本行来“扁平化” XML层次结构,每行文本都列出其父级(例如,Fruit> Apple> apl),而不列出其同级(例如,大红色美味,中红色美味或小红色)美味的)。

I realize this can be accomplished by first converting the data to a relational database format, then running a query, etc, but I was hoping for a more direct solution directly from the XML. 我意识到可以通过首先将数据转换为关系数据库格式,然后运行查询等操作来实现,但是我希望直接从XML获得更直接的解决方案。

Hope this makes sense...thanks 希望这有意义...谢谢

There is a nice tool called xmltodict that makes an hierarchic data structure right out of the xml: 有一个名为xmltodict的不错的工具,可以直接在xml 之外创建分层数据结构:

import json
import xmltodict


data = """your xml goes here"""

result = xmltodict.parse(data)
print(json.dumps(result, indent=4))

For the xml you've provided (with several alterations to make it well-formed, see my comment) it prints: 对于您提供的xml(进行了一些更改以使其格式正确,请参阅我的评论),它会打印:

{
    "root": {
        "mainTerm": [
            {
                "title": "Meat", 
                "see": "protein"
            }, 
            {
                "title": "Vegetables", 
                "see": "starch"
            }, 
            {
                "title": "Fruit", 
                "term": [
                    {
                        "@level": "1", 
                        "title": "Apple", 
                        "code": "apl"
                    }, 
                    {
                        "@level": "1", 
                        "title": "Red Delicious", 
                        "code": "rd", 
                        "term": [
                            {
                                "@level": "2", 
                                "title": "Large Red Delicious", 
                                "code": "lrd"
                            }, 
                            {
                                "@level": "2", 
                                "title": "Medium Red Delicious", 
                                "code": "mrd"
                            }, 
                            {
                                "@level": "2", 
                                "title": "Small Red Delicious", 
                                "code": "mrd"
                            }
                        ]
                    }, 
                    {
                        "@level": "1", 
                        "title": "Grapes", 
                        "code": "grp"
                    }, 
                    {
                        "@level": "1", 
                        "title": "Peaches", 
                        "code": "pch"
                    }
                ]
            }
        ]
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM