简体   繁体   English

开始:一次仅解码一个XML节点

[英]Go: Decoding only one XML node at a time

Looking through the sourcecode for encoding/xml package, all of the unmarshaling logic (which decodes the actual XML nodes and types them) is in unmarshal and the only way to invoke this is essentially by calling DecodeElement. 浏览用于encoding / xml包的源代码,所有解组逻辑(用于解码实际的XML节点并键入它们的类型)都处于解组状态,而调用此方法的唯一方法实质上是调用DecodeElement。 However, the unmarshaling logic also inherently searches-out the next EndElement. 但是,解编组逻辑还固有地搜索下一个EndElement。 The predominant reason for this seems to be validation. 造成这种情况的主要原因似乎是验证。 However, this seems to represent a major design flaw to me: What if I have a massive XML file, I am sufficiently confident in its structure, and I'd just like to decode a single node at a time so that I can efficiently filter through the data on-the-fly? 但是,这似乎对我来说是一个主要的设计缺陷:如果我有一个庞大的XML文件,我对该文件的结构有足够的信心,并且我想一次解码一个节点,以便可以高效地进行过滤,那该怎么办?通过实时数据? The RawToken() call can be used to get the current tag, which is great, but, obviously, when you call DecodeElement() on it, there's an error when the inevitable unmarshal() call apparently starts running into nodes in a way that it perceives as unbalanced. RawToken()调用可用于获取当前标签,这很好,但是很显然,当您在其上调用DecodeElement()时,不可避免的unmarshal()调用显然开始以某种方式运行到节点中时会出现错误它被认为是不平衡的。

It seems theoretically possible to encounter a token that I'd like to decode, capture the offset, decode the element, seek back to the previous position, and loop, but that'd still result in a massive amount of unnecessary processing. 从理论上讲,可能会遇到我要解码,捕获偏移量,对元素进行解码,返回到先前位置并循环的令牌,但这仍然会导致大量不必要的处理。

Is there no way to just parse one node at a time? 有没有办法一次只解析一个节点?

What you describe is called XML stream parsing as it is done by any SAX parser, for example. 例如,您描述的内容称为XML流解析,因为它是由任何SAX解析器完成的。 Good news: encoding/xml supports that, albeit it is a bit hidden. 好消息: encoding/xml支持,尽管它有点隐藏。

What you actually have to do is to create an instance of xml.Decoder , passing an io.Reader . 您实际要做的是创建xml.Decoder实例,并传递io.Reader Then you will use Decoder.Token() to read the input stream until the next valid xml token found. 然后,您将使用Decoder.Token()读取输入流,直到找到下一个有效的 xml令牌。 From there, you can decide what to do next. 从那里,您可以决定下一步要做什么。

Here is a little example also available as gist , or you can Run it on PlayGround : 这是一个可以作为gist使用的小例子,或者您可以在PlayGround上运行它

package main

import (
    "bytes"
    "encoding/xml"
    "fmt"
)

const (
    book = `<?xml version="1.0" encoding="UTF-8"?>
<book>
  <preface>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</preface>
  <chapter num="1" title="Foo">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
  <chapter num="2" title="Bar">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
</book>`
)

type Chapter struct {
    Num     int    `xml:"num,attr"`
    Title   string `xml:"title,attr"`
    Content string `xml:",chardata"`
}

func main() {

    // We emulate a file or network stream
    b := bytes.NewBufferString(book)

    // And set up a decoder
    d := xml.NewDecoder(b)

    for {

        // We look for the next token
        // Note that this only reads until the next positively identified
        // XML token in the stream
        t, err := d.Token()

        if err != nil  {
            break
        }

        switch et := t.(type) {

        case xml.StartElement:
            // We now have to inspect wether we are interested in the element
            // otherwise we will advance
            if et.Name.Local == "chapter" {
                // Most often/likely element first

                c := &Chapter{}

                // We decode the element into(automagically advancing the stream)
                // If no matching token is found, there will be an error
                // Note the search only happens within the parent.
                if err := d.DecodeElement(&c, &et); err != nil {
                    panic(err)
                }

                // We have found what we are interested in, so we print it
                fmt.Printf("%d: %s\n", c.Num, c.Title)

            } else if et.Name.Local == "book" {
                fmt.Println("Book begins!")
            }

        case xml.EndElement:

            if et.Name.Local != "book" {
                continue
            }

            fmt.Println("Finished processing book!")
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在go中解码XML的问题 - Problems decoding XML in go 如何从具有父节点和子节点的结构的 XML 文件中的 go 到只有一层的结构 - How to go from XML file with a stucture having a parent node and a child node to a structure with only one level 我可以一次在sql server中仅解析xml中的一个节点吗? - May I only parse one node in xml at one time in sql server? 使用 SQL Server 导入 XML,问题是它一次只返回一条记录,我想要一次完成所有记录 - Import XML using SQL Server, the problem is it is returning only one record at a time and I want all records in one go Go:为 XML 解码提升嵌套结构中的字段 - Go: promoted fields in nested struct for XML decoding 如何只在xml和php中查询一个节点? - how to query one node only in xml and php? 一个 go 如何将一个 xml 文档的节点复制到另一个? - how does one go about copying one xml document's node to another? 使用拉式解析器一次在Java中读取一个节点的大型xml文件? - Reading large xml files one node at a time in Java with a pull parser? sp_xml_preparedocument 出现错误“在 XML 文档中只允许一个顶级元素” - sp_xml_preparedocument go with error “Only one top level element is allowed in an XML document” XSL-仅从XML中的一个节点获取数据 - XSL - Only getting data from one node in my XML
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM