简体   繁体   English

有没有为人类设计的Python XML解析器?

[英]Is there any Python XML parser that was designed with humans in mind?

I like Python, but I don't want to write 10 lines just to get an attribute from an element. 我喜欢Python,但我不想仅仅为了从元素中获取属性而编写10行。 Maybe it's just me, but minidom isn't that mini . 也许只是我,但minidom不是那么mini The code I have to write in order to parse something using it looks a lot like Java code. 为了使用它解析一些东西,我必须编写的代码看起来很像Java代码。

Is there something that is more user-friendly ? 有什么东西更user-friendly吗? Something with overloaded operators, and which maps elements to objects? 有重载运算符的东西,它将元素映射到对象?

I'd like to be able to access this : 我希望能够访问:


<root>
<node value="30">text</node>
</root>

as something like this : 像这样的事情:


obj = parse(xml_string)
print obj.node.value

and not using getChildren or some other methods like that. 而不是使用getChildren或其他类似的方法。

You should take a look at ElementTree . 你应该看一下ElementTree It's not doing exactly what you want but it's a lot better then minidom. 它并没有完全符合你的要求,但它比minidom要好得多。 If I remember correctly, starting from python 2.4, it's included in the standard libraries. 如果我没记错的话,从python 2.4开始,它包含在标准库中。 For more speed use cElementTree. 为了更快速地使用cElementTree。 For more more speed (and more features) you can use lxml (check the objectify API for your needs/approach). 要获得更多速度(以及更多功能),您可以使用lxml (根据您的需求/方法检查objectify API)。

I should add that BeautifulSoup do partly what you want. 我应该补充一点, BeautifulSoup部分地做你想要的。 There's also Amara that have this approach. Amara也有这种方法。

I actually wrote a library that does things exactly the way you imagined it. 我实际上写了一个完全按照你想象的方式完成工作的库。 The library is called "xe" and you can get it from: 该库名为“xe”,您可以从以下位置获取: http://home.avvanta.com/~steveha/xe.html http://home.avvanta.com/~steveha/xe.html

xe can import XML to let you work with the data in an object-oriented way. xe可以导入XML,以便以面向对象的方式处理数据。 It actually uses xml.dom.minidom to do the parsing, but then it walks over the resulting tree and packs the data into xe objects. 它实际上使用xml.dom.minidom进行解析,但随后它遍历生成的树并将数据打包到xe对象中。

EDIT: Okay, I went ahead and implemented your example in xe, so you can see how it works. 编辑:好的,我继续在xe中实现你的例子,所以你可以看到它是如何工作的。 Here are classes to implement the XML you showed: 以下是实现您显示的XML的类:

import xe

class Node(xe.TextElement):
    def __init__(self, text="", value=None):
        xe.TextElement.__init__(self, "node", text)
        if value is not None:
            self.attrs["value"] = value

class Root(xe.NestElement):
    def __init__(self):
        xe.NestElement.__init__(self, "root")
        self.node = Node()

And here is an example of using the above. 以下是使用上述内容的示例。 I put your sample XML into a file called "example.xml", but you could also just put it into a string and pass the string. 我将您的示例XML放入名为“example.xml”的文件中,但您也可以将其放入字符串并传递字符串。

>>> root = Root()
>>> print root
<root/>
>>> root.import_xml("example.xml")
<Root object at 0xb7e0c52c>
>>> print root
<root>
    <node value="30">text</node>
</root>
>>> print root.node.attrs["value"]
30
>>>

Note that in this example, the type of "value" will be a string. 请注意,在此示例中,“value”的类型将是一个字符串。 If you really need attributes of another type, that's possible too with a little bit of work, but I didn't bother for this example. 如果你真的需要另一种类型的属性,那也可以通过一点点的工作,但我没有为这个例子而烦恼。 (If you look at PyFeed, there is a class for OPML that has an attribute that isn't text.) (如果你看一下PyFeed,有一个OPML类,它有一个非文本属性。)

I had same need for simple xml parser and after a long time spent on checking different libraries I found xmltramp . 我对简单的xml解析器有同样的需求,经过很长一段时间花在检查不同的库上我找到了xmltramp

Based on your example xml: 根据您的示例x​​ml:

import xmltramp

xml_string = """<root>
<node value="30">text</node>
</root>"""

obj = xmltramp.parse(xml_string)
print obj.node('value')             # 30
print str(obj.node)                 # text

I didn't found anything more user-friendly. 我没有找到任何更方便用户的东西。

I spent a fair bit of time going through the examples provided above and through the repositories listed on pip. 我花了相当多的时间浏览上面提供的示例以及pip上列出的存储库。

The easiest (and most Pythonic) way of parsing XML that I have found so far has been XMLToDict - https://github.com/martinblech/xmltodict 到目前为止,我发现的最简单(也是最Pythonic)解析XML的方法是XMLToDict - https://github.com/martinblech/xmltodict

The example from the documentation available at GitHub above is copy-pasted below; 上面GitHub上提供的文档示例在下面复制粘贴; It's made life VERY simple and EASY for me a LOT of times; 它让生活变得非常简单,很容易让我很轻松;

>>> doc = xmltodict.parse("""
... <mydocument has="an attribute">
...   <and>
...     <many>elements</many>
...     <many>more elements</many>
...   </and>
...   <plus a="complex">
...     element as well
...   </plus>
... </mydocument>
... """)
>>>
>>> doc['mydocument']['@has']
u'an attribute'
>>> doc['mydocument']['and']['many']
[u'elements', u'more elements']
>>> doc['mydocument']['plus']['@a']
u'complex'
>>> doc['mydocument']['plus']['#text']
u'element as well'

It works really well and gave me just what I was looking for. 它工作得很好,给了我正在寻找的东西。 However, if you're looking at reverse transformations, that is an entirely different matter altogether. 但是,如果你正在研究逆向变换,那就完全不同了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM