简体   繁体   English

Python ElementTree:ElementTree与root元素

[英]Python ElementTree: ElementTree vs root Element

I'm a bit confused by some of the design decisions in the Python ElementTree API - they seem kind of arbitrary, so I'd like some clarification to see if these decisions have some logic behind them, or if they're just more or less ad hoc. 我对Python ElementTree API中的一些设计决策感到有些困惑 - 它们似乎有点武断,所以我想澄清一下这些决定是否有一些逻辑背后,或者它们是否只是更多或者不那么特别。

So, generally there are two ways you might want to generate an ElementTree - one is via some kind of source stream, like a file, or other I/O stream. 因此,通常有两种方法可能需要生成ElementTree - 一种是通过某种源流,如文件或其他I / O流。 This is achieved via the parse() function, or the ElementTree.parse() class method. 这是通过parse()函数或ElementTree.parse()类方法实现的。

Another way is to load the XML directly from a string object. 另一种方法是直接从字符串对象加载XML。 This can be done via the fromstring() function. 这可以通过fromstring()函数完成。

Okay, great. 好,太棒了。 Now, I would think these functions would basically be identical in terms of what they return - the difference between the two of them is basically the source of input (one takes a file or stream object, the other takes a plain string.) Except for some reason the parse() function returns an ElementTree object, but the fromstring() function returns an Element object. 现在,我认为这些函数在它们返回的内容方面基本相同 - 它们之间的差异基本上是输入源(一个接受文件或流对象,另一个接受普通字符串。)除了某种原因, parse()函数返回一个ElementTree对象,但fromstring()函数返回一个Element对象。 The difference is basically that the Element object is the root element of an XML tree, whereas the ElementTree object is sort of a "wrapper" around the root element, which provides some extra features. 区别在于Element对象是XML树的元素,而ElementTree对象是根元素周围的“包装器”,它提供了一些额外的功能。 You can always get the root element from an ElementTree object by calling getroot() . 您始终可以通过调用getroot()ElementTree对象获取根元素。

Still, I'm confused why we have this distinction. 不过,我很困惑为什么我们有这种区别。 Why does fromstring() return a root element directly, but parse() returns an ElementTree object? 为什么fromstring()直接返回根元素,但是parse()返回一个ElementTree对象? Is there some logic behind this distinction? 这种区别背后有一些逻辑吗?

A beautiful answer comes from this old discussion : 这个古老的讨论得出了一个很好的答案:

Just for the record: Fredrik [the creator of ElementTree] doesn't actually consider it a design "quirk". 仅供记录:Fredrik [ElementTree的创建者]实际上并不认为它是一个设计“怪癖”。 He argues that it's designed for different use cases. 他认为它是针对不同的用例而设计的。 While parse() parses a file, which normally contains a complete document (represented in ET as an ElementTree object), fromstring() and especially the 'literal wrapper' XML() are made for parsing strings, which (most?) often only contain XML fragments. 虽然parse()解析一个文件,该文件通常包含一个完整的文档(在ET中表示为ElementTree对象),但是来自fromstring(),特别是'literal wrapper'XML()用于解析字符串,其中(大多数?)通常只有包含XML片段。 With a fragment, you normally want to continue doing things like inserting it into another tree, so you need the top-level element in almost all cases. 使用片段,您通常希望继续执行将其插入另一个树的操作,因此几乎在所有情况下都需要顶级元素。

And: 和:

Why isn't et.parse the only way to do this? 为什么et.parse不是唯一的方法呢? Why have XML or fromstring at all? 为什么要使用XML或fromstring?

Well, use cases. 用例。 XML() is an alias for fromstring(), because it's convenient (and well readable) to write XML()是fromstring()的别名,因为它的编写方便(并且读得很好)

section = XML('A to Z') section.append(paragraphs) section = XML('A to Z')section.append(paragraph)

for XML literals in source code. 用于源代码中的XML文字。 fromstring() is there because when you want to parse a fragment from a string that you got from whatever source, it's easy to express that with exactly that function, as in fromstring()就在那里,因为当你想从你从任何来源获得的字符串中解析一个片段时,很容易用这个函数来表达它,就像在

  el = fromstring(some_string) 

If you want to parse a document from a file or file-like object, use parse(). 如果要从文件或类文件对象解析文档,请使用parse()。 Three use cases, three functions. 三个用例,三个功能。 The fourth use case of parsing a document from a string does not have its own function, because it is trivial to write 从字符串解析文档的第四个用例没有自己的函数,因为编写它是微不足道的

  tree = parse(BytesIO(some_byte_string)) 

I'm thinking the same as remram in the comments: parse takes a file location or a file object and preserves that information so that it can provide additional utility, which is really helpful. 我在评论中的思路与remram相同: parse采用文件位置或文件对象并保留该信息,以便它可以提供额外的实用程序,这非常有用。 If parse did not return an ET object, then you would have to keep better track of the sources and whatnot in order to manually feed them back into the helper functions that ET objects have by default. 如果解析没有返回ET对象,那么您必须更好地跟踪源和诸如此类的东西,以便手动将它们反馈到ET对象默认具有的辅助函数中。 In contrast to files, Strings- by definition- do not have the same kind of information attached from them, so you can't create the same utilities for them (otherwise there very well may be an ET.parsefromstring() method which would return an ET Object). 与文件相比,字符串 - 按定义 - 没有从它们附加的相同类型的信息,因此您不能为它们创建相同的实用程序(否则很可能是ET.parsefromstring()方法将返回一个ET对象)。

I suspect this is also the logic behind the method being named parse instead of ET.fromfile() : I would expect the same object type to be returned from fromfile and fromstring , but can't say I would expect the same from parse (it's been a long time since I started using ET, so there's no way to verify that, but that's my feeling). 我怀疑这也是被称为parse而不是ET.fromfile()的方法背后的逻辑:我希望从fromfilefromstring返回相同的对象类型,但不能说我希望从解析中得到相同的(它是自从我开始使用ET以来已经很长时间了,所以没有办法验证,但这是我的感觉)。

On the subject Remram raised of placing utility methods on Elements, as I understand the documentation, Elements are extremely uniformed when it comes to implementation. 关于这个问题,Remram提出了在Elements上放置实用程序方法,据我所知文档,Elements在实现时非常均匀。 People talk about "Root Elements," but the Element at the root of the tree is literally identical to all other Elements in terms of its class Attributes and Methods. 人们谈论“根元素”,但树的根部的元素在类属性和方法方面与所有其他元素完全相同。 As far as I know, Elements don't even know who their parent is, which is likely to support this uniformity. 据我所知,Elements甚至不知道他们的父母是谁,这可能会支持这种统一性。 Otherwise there might be more code to implement the "root" Element (which doesn't have a parent) or to re-parent subelements. 否则,可能会有更多代码来实现“root”元素(没有父元素)或重新父元素。 It seems to me that the simplicity of the Element class works greatly in its favor. 在我看来,Element类的简单性对其有利。 So it seems better to me to leave Elements largely agnostic of anything above them (their parent, the file they come from) so there can't be any snags concerning 4 Elements with different output files in the same tree (or the like). 所以对我来说似乎更好的做法是让Elements基本上不知道它们之上的任何东西(它们的父级,它们来自哪个文件),所以在同一个树(或类似的)中不能有任何关于4个不同输出文件的元素的障碍。

When it comes to implementing the module inside of code, it seems to me that the script would have to recognize the input as a file at some point, one way or another (otherwise it would be trying to pass the file to fromstring ). 当谈到在代码中实现模块时,在我看来,脚本必须在某个时刻以某种方式将输入识别为文件(否则它会尝试将文件传递给fromstring )。 So there shouldn't arise a situation in which the output of parse should be unexpected such that the ElementTree is assumed to be an Element and processed as such (unless, of course, parse was implemented without the programmer checking to see what parse did, which just seems like a poor habit to me). 所以不应该出现这样的解析的输出应该是意外使得ElementTree的被认为是一个元素,作为这样的处理(当然,除非, 解析无需程序员检查,看看有什么解析做实施的情况下,这对我来说似乎是一个糟糕的习惯)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM