简体   繁体   中英

mixed XML decoding in golang preserving order

I need to extract offers from an XML, but taking into consideration nodes order:

<items>
  <offer/>
  <product>
    <offer/>
    <offer/>
  </product>
  <offer/>
  <offer/>
</items>

The following struct would decode the values, but into two different slices, which will cause loss of original order:

type Offers struct {
    Offers   []offer `xml:"items>offer"`
    Products []offer `xml:"items>product>offer"`
}

Any ideas?

One way would be to overwrite the UnmarshalXML method. Let's say our input looks like this:

<doc>
    <head>My Title</head>
    <p>A first paragraph.</p>
    <p>A second one.</p>
</doc>

We want to deserialize the document and preserve the order of the head and paragraphs. For order we will need a slice. To accommodate both head and p , we will need an interface. We could define our document like this:

type Document struct {
    XMLName  xml.Name `xml:"doc"`
    Contents []Mixed  `xml:",any"`
}

The ,any annotation will collect any element into Contents . It is a Mixed type, which we need to define as a type:

type Mixed struct {
    Type  string      // just keep "head" or "p" in here
    Value interface{} // keep the value, we could use string here, too
}

We need more control over the deserialization process, so we turn Mixed into an xml.Unmashaler by implementing UnmarshalXML . We decide on the code path based on the name of the start element, eg head or p . Here, we only populate our Mixed struct with some values, but you can basically do anything here:

func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    switch start.Name.Local {
    case "head", "p":
        var e string
        if err := d.DecodeElement(&e, &start); err != nil {
            return err
        }
        m.Value = e
        m.Type = start.Name.Local
    default:
        return fmt.Errorf("unknown element: %s", start)
    }
    return nil
}

Putting it all together, usage of the above structs could look like this:

func main() {
    s := `
    <doc>
        <head>My Title</head>
        <p>A first paragraph.</p>
        <p>A second one.</p>
    </doc>
    `

    var doc Document
    if err := xml.Unmarshal([]byte(s), &doc); err != nil {
        log.Fatal(err)
    }
    fmt.Printf("#%v", doc)
}   

Which would print.

#{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]}

We preserved order and kept some type information. Instead of a single type, like Mixed you could use many different types for the deserialization. The cost of this approach is that your container - here the Contents field of the document - is an interface. To do anything element-specific, you'll need a type assertion or some helper method.

Complete code on play: https://play.golang.org/p/fzsUPPS7py

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM