简体   繁体   中英

XML serialization and deserialization objects containing invalid chars in properties

I know this was asked before for many times but still don't see a good solution.
There is an object like this:

public class DTO
{
    public string Value;
}

I need to serialize it in the Exporter app and then deserialize in the Importer.
Object's Value may contain characters who are not valid for XML (ex 0x8). I need to either let Exporter remove such chars or let Importer successfully load object containing the chars. I wouldn't like to clean up objects before serialization because I have tens of them with tens string properties each.

  1. Importer side. If I enable CheckCharacters here then I'll get error on serialization step. I don't see a way to custom control all strings at one spot. If I disable it then the XML will contain invalid char.

     XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { CheckCharacters = false }; XmlSerializer xmlSerializer = new XmlSerializer(typeof(DTO)); StringBuilder sb = new StringBuilder(); DTO dto = new DTO { Value = Convert.ToChar(0x08).ToString() }; using (XmlWriter xmlWriter = XmlWriter.Create(sb, xmlWriterSettings)) { xmlSerializer.Serialize(xmlWriter, dto); xmlWriter.Flush(); xmlWriter.Close(); } 
  2. Ok, if I let invalid char go to XML then there is no way to handle it on Import side. Even if CheckCharacters = false, the error occurs on Deserialize() call:

     var _reader = XmlReader.Create(File.OpenText(path), new XmlReaderSettings() { CheckCharacters = false }); _reader.MoveToContent(); var outerXml = _reader.ReadOuterXml(); xmlSerializer.Deserialize(new StringReader(outerXml)); <== getting error here 

Is there a way to remove invalid chars in either step and let the object exported/imported without errors?

That was my bad :(
In here:

var outerXml = _reader.ReadOuterXml();
xmlSerializer.Deserialize(new StringReader(outerXml)); <== getting error here

xmlSerializer was actually using an implicitly created internal XmlReader which did check characters. All I had to do four hours ago was:

xmlSerializer.Deserialize(_reader);

I'm not saying this is a great solution but code below will remove non UTF8 characters when serializing :

    public class DTO
    {
        private string _value { get; set; }
        public string Value
        {
            get { return Encoding.UTF8.GetString(_value.Select(x => (byte)((int)x)).ToArray()); }
            set { _value = value; }
        }

    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM