简体   繁体   中英

XmlSerializer filter by attribute

When deserializing an XML to an entity class using XmlSerializer, is it possible to filter by an attribute? For example, let's say I have an item which can be of type "a" or type "b". I want to deserialize all items but only those of type "a".

I need this because my real situation is that our endpoint receives very big XMLs (some can be upwards of 100MB) with hundreds of thousands of tags of type <item> but I need only some of them - those of type "a". I want to avoid allocations for the rest (including their child tags which are not few).

Example XML:

<root>
  <item type="a"/>
  <item type="a"/>
  <item type="b"/>
  <item type="c"/>
</root>

Entities:

[XmlRoot("root")]
public class Root {
    [XmlElement("item")]
    public Item[] Items { get; set; }
}
public class Item  {
    [XmlAttribute("type")]
    [DeserializeIfValueIs("a")] // <-- Is there something like this?
    public string Type { get; set; }
}

Code:

var serializer = new XmlSerializer(typeof(Root));
var dto = (Root) serializer.Deserialize(XmlReader.Create("input.xml"));
// Show the results - {"Items":[{"Type":"a"},{"Type":"a"},{"Type":"b"},{"Type":"c"}]}
Console.WriteLine(JsonConvert.SerializeObject(dto));

How do I make it allocate objects only for type "a" items?

Obligatory note: This is neither an XY problem nor premature optimization. We have identified that we need to improve performance in this with profiling and so on. Also filtering out the values post-deserialization doesn't help - by that time the allocations have already been made and will have to be garbage-collected.

This is possible by handling the de-serialization process ourselves (at least for the root class)

Please let me remind you that the XML content you provided is insufficient to run unit tests on, so this is a very basic implementation which, however, should work for you directly or by just tweaking a little bit over here and there.

First of all, we change our Item class XML serialization attribute to root. The "Why" will be answered soon.

[XmlRoot("item")]
public class Item
{
    [XmlAttribute("type")]
    public string Type { get; set; }

    [XmlElement("prop1")]
    public int Prop1 { get; set; }
}

I've also added a simple integer property to prove that the deserialization works as expected.

I also changed the XML content to match the new type, for testing.

<root>
  <item type="b">
    <prop1>5</prop1>
  </item>
  <item type="a">
    <prop1>5</prop1>
  </item>
  <item type="a">
    <prop1>5</prop1>
  </item>
  <item type="b">
    <prop1>5</prop1>
  </item>
  <item type="c">
    <prop1>5</prop1>
  </item>
</root>

And now comes the Root class, which implements IXmlSerializable explicitly now:

[XmlRoot("root")]
public class Root : IXmlSerializable
{
    [XmlElement("item")]
    public Item[] Items { get; set; }

    // These two methods are not implemented for you need to deserialize only,
    // and because you haven't provided the schema for your XML content
    System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema() { throw new NotImplementedException(); }
    void IXmlSerializable.WriteXml(System.Xml.XmlWriter writer) { throw new NotImplementedException(); }

    void IXmlSerializable.ReadXml(System.Xml.XmlReader reader)
    {
        // The element is <root> when here for the first time.

        // Maintain a list to keep items with type "a"
        List<Item> typeAItems = new List<Item>();

        // Create a serializer for the type Item
        XmlSerializer deserializer = new XmlSerializer(typeof(Item));

        while (reader.Read())
        {
            // The code is self explanatory.
            // Skip() will help omitting unnecessary reads
            // if we are not interested in the Item
            if (reader.IsStartElement() && reader.Name == "item")
            {
                if (reader.GetAttribute("type") == "a")
                {
                    // This works, and deserializes the current node
                    // into an Item object. When the deserialization
                    // is completed, the reader is at the beginning
                    // of the next <Item> element
                    typeAItems.Add((Item)deserializer.Deserialize(reader));
                }
                else
                {
                    // skip element with all its children
                    reader.Skip();
                }
            }
            else
            {
                // skip element with all its children
                reader.Skip();
            }
        }
        Items = typeAItems.ToArray();
    }
}

The deserialization logic is kept the same, like new XmlSerializer(typeof(Root)).Deserialize().

The rest.. is to test.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM