[英]Go: Decoding only one XML node at a time
Looking through the sourcecode for encoding/xml package, all of the unmarshaling logic (which decodes the actual XML nodes and types them) is in unmarshal and the only way to invoke this is essentially by calling DecodeElement. 浏览用于encoding / xml包的源代码,所有解组逻辑(用于解码实际的XML节点并键入它们的类型)都处于解组状态,而调用此方法的唯一方法实质上是调用DecodeElement。 However, the unmarshaling logic also inherently searches-out the next EndElement. 但是,解编组逻辑还固有地搜索下一个EndElement。 The predominant reason for this seems to be validation. 造成这种情况的主要原因似乎是验证。 However, this seems to represent a major design flaw to me: What if I have a massive XML file, I am sufficiently confident in its structure, and I'd just like to decode a single node at a time so that I can efficiently filter through the data on-the-fly? 但是,这似乎对我来说是一个主要的设计缺陷:如果我有一个庞大的XML文件,我对该文件的结构有足够的信心,并且我想一次解码一个节点,以便可以高效地进行过滤,那该怎么办?通过实时数据? The RawToken() call can be used to get the current tag, which is great, but, obviously, when you call DecodeElement() on it, there's an error when the inevitable unmarshal() call apparently starts running into nodes in a way that it perceives as unbalanced. RawToken()调用可用于获取当前标签,这很好,但是很显然,当您在其上调用DecodeElement()时,不可避免的unmarshal()调用显然开始以某种方式运行到节点中时会出现错误它被认为是不平衡的。
It seems theoretically possible to encounter a token that I'd like to decode, capture the offset, decode the element, seek back to the previous position, and loop, but that'd still result in a massive amount of unnecessary processing. 从理论上讲,可能会遇到我要解码,捕获偏移量,对元素进行解码,返回到先前位置并循环的令牌,但这仍然会导致大量不必要的处理。
Is there no way to just parse one node at a time? 有没有办法一次只解析一个节点?
What you describe is called XML stream parsing as it is done by any SAX parser, for example. 例如,您描述的内容称为XML流解析,因为它是由任何SAX解析器完成的。 Good news: encoding/xml
supports that, albeit it is a bit hidden. 好消息: encoding/xml
支持,尽管它有点隐藏。
What you actually have to do is to create an instance of xml.Decoder
, passing an io.Reader
. 您实际要做的是创建xml.Decoder
实例,并传递io.Reader
。 Then you will use Decoder.Token()
to read the input stream until the next valid xml token found. 然后,您将使用Decoder.Token()
读取输入流,直到找到下一个有效的 xml令牌。 From there, you can decide what to do next. 从那里,您可以决定下一步要做什么。
Here is a little example also available as gist , or you can Run it on PlayGround : 这是一个可以作为gist使用的小例子,或者您可以在PlayGround上运行它 :
package main
import (
"bytes"
"encoding/xml"
"fmt"
)
const (
book = `<?xml version="1.0" encoding="UTF-8"?>
<book>
<preface>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</preface>
<chapter num="1" title="Foo">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
<chapter num="2" title="Bar">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
</book>`
)
type Chapter struct {
Num int `xml:"num,attr"`
Title string `xml:"title,attr"`
Content string `xml:",chardata"`
}
func main() {
// We emulate a file or network stream
b := bytes.NewBufferString(book)
// And set up a decoder
d := xml.NewDecoder(b)
for {
// We look for the next token
// Note that this only reads until the next positively identified
// XML token in the stream
t, err := d.Token()
if err != nil {
break
}
switch et := t.(type) {
case xml.StartElement:
// We now have to inspect wether we are interested in the element
// otherwise we will advance
if et.Name.Local == "chapter" {
// Most often/likely element first
c := &Chapter{}
// We decode the element into(automagically advancing the stream)
// If no matching token is found, there will be an error
// Note the search only happens within the parent.
if err := d.DecodeElement(&c, &et); err != nil {
panic(err)
}
// We have found what we are interested in, so we print it
fmt.Printf("%d: %s\n", c.Num, c.Title)
} else if et.Name.Local == "book" {
fmt.Println("Book begins!")
}
case xml.EndElement:
if et.Name.Local != "book" {
continue
}
fmt.Println("Finished processing book!")
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.