[英]Which Python XML library should I use?
I am going to handle XML files for a project. 我将处理项目的XML文件。 I had earlier decided to use lxml but after reading the requirements, I think ElemenTree would be better for my purpose.
我之前决定使用lxml,但在阅读了要求之后,我认为ElemenTree会更好用于我的目的。
The XML files that have to be processed are: 必须处理的XML文件是:
Small in size. 体积小。 Typically < 10 KB.
通常<10 KB。
No namespaces. 没有名称空间。
Simple XML structure. 简单的XML结构。
Given the small XML size, memory is not an issue. 鉴于XML的大小,内存不是问题。 My only concern is fast parsing.
我唯一关心的是快速解析。
What should I go with? 我该怎么办? Mostly I have seen people recommend lxml, but given my parsing requirements, do I really stand to benefit from it or would ElementTree serve my purpose better?
大多数情况下,我看到人们推荐lxml,但考虑到我的解析要求,我是否真的能从中受益,或者ElementTree会更好地服务于我的目的?
As others have pointed out, lxml implements the ElementTree API, so you're safe starting out with ElementTree and migrating to lxml if you need better performance or more advanced features. 正如其他人所指出的,lxml实现了ElementTree API,因此如果您需要更好的性能或更高级的功能,那么从ElementTree开始安全并迁移到lxml。
The big advantage of using ElementTree, if it meets your needs, is that as of Python 2.5 it is part of the Python standard library , which cuts down on external dependencies and the (possible) headache of dealing with compiling/installing C modules. 如果它满足您的需求,使用ElementTree的最大优点是,从Python 2.5开始,它是Python标准库的一部分 ,它减少了外部依赖性和处理编译/安装C模块的(可能)头痛。
lxml is basically a superset of ElementTree so you could start with ElementTree and then if you have performance or functionality issues then you could change to lxml. lxml基本上是ElementTree的超集,所以你可以从ElementTree开始,然后如果你有性能或功能问题,那么你可以改为lxml。
Performance issues can only be studied by you using your own data, 性能问题只能由您使用自己的数据来研究,
I recommend my own recipe 我推荐自己的食谱
XML to Python data structure « Python recipes « ActiveState Code XML到Python的数据结构«Python食谱«ActiveState代码
It does not speed up parsing. 它不会加速解析。 But it provides a really native object style access.
但它提供了真正的本机对象样式访问。
>>> SAMPLE_XML = """<?xml version="1.0" encoding="UTF-8"?>
... <address_book>
... <person gender='m'>
... <name>fred</name>
... <phone type='home'>54321</phone>
... <phone type='cell'>12345</phone>
... <note>"A<!-- comment --><![CDATA[ <note>]]>"</note>
... </person>
... </address_book>
... """
>>> address_book = xml2obj(SAMPLE_XML)
>>> person = address_book.person
person.gender -> 'm' # an attribute
person['gender'] -> 'm' # alternative dictionary syntax
person.name -> 'fred' # shortcut to a text node
person.phone[0].type -> 'home' # multiple elements becomes an list
person.phone[0].data -> '54321' # use .data to get the text value
str(person.phone[0]) -> '54321' # alternative syntax for the text value
person[0] -> person # if there are only one <person>, it can still
# be used as if it is a list of 1 element.
'address' in person -> False # test for existence of an attr or child
person.address -> None # non-exist element returns None
bool(person.address) -> False # has any 'address' data (attr, child or text)
person.note -> '"A <note>"'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.