简体   繁体   中英

Should I be worried about encoding during serialization?

public string Serialize(BackgroundJobInfo info)
{
    var stringBuilder = new StringBuilder();
    using (var stringWriter = new StringWriter(stringBuilder, CultureInfo.InvariantCulture))
    {
        var writer = XmlWriter.Create(stringWriter);
        ...

By default, StringWriter will advertise itself as being in UTF-16 . Usually XML is in UTF-8 . So I can fix this by subclassing StringWriter

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding
    {
         get { return Encoding.UTF8; }
    }
}

But why should I worry about that? What will be if I decide to use StringWriter (like I did) instead of Utf8StringWriter ? Will I have some bug?

After that I will write this string to MongoDb

StringWriter 's Encoding property actually is not that useful, as the underlying thing it writes to is a StringBuilder , which produces a .Net string . .Net strings are encoded internally in utf16, but that's an implementation detail you don't have to worry about. Encoding is just a property inherited from TextWriter , because a TextWriter can potentially write to targets where encoding does matter ( Stream , byte[] , ...).

In the end, you will end up with a plain old string . The encoding you will use to serialize that string later on is not fixed yet, and if you're using a MongoDb client implementation that takes a string as an argument, it is not even your concern!


On a side note, overriding the getter of the Encoding property would not change the way encoding would happen inside even if encoding was actually involved in StringWriter

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM