简体   繁体   中英

Converting Base64 to string inserts whitespaces

I'm trying to convert a Base64 encoded string to text. I'm using the following code:

public static string Base64Decode(string base64EncodedData)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    return System.Text.Encoding.UTF8.GetString(base64EncodedBytes);
}

Somehow it does work but it puts whitespaces after each character.Furthermore, it adds an invalid character in the beginning of converted string. The content in Base64 string is an XML so when it converts it to text and puts whitespaces, the XML becomes invalid. Is there any alternative to this?

here's a sample output after conversion:

? < ? x m l  v e r s i o n = " 1 . 0 "  e n c o d i n g = " U T F - 1 6 "  s t a n d a l o n e = " n o " ? >   < I m p o r t >     < o p t i o n s >       < P r o c N a m e > E R P N u m b e r < / P r o c N a m e >       < J o b I D > A N L 0 0 1 8 5 0 < / J o b I D >     < / o p t i o n s >     < R o w >       < D o c I d  / >       < E R P N u m b e r  / >     < / R o w >   < / I m p o r t > 

It looks like the original binary data is string converted to bytes using UTF-16, which matches the encoding="UTF-16" part of the text. You need to use the right encoding when converting the binary data back to a string:

return Encoding.Unicode.GetString(base64EncodedBytes);

That's assuming you can't change what's producing the data in the first place. If you can change that to use UTF-8 instead, you'll end up with half as much data if the text is all ASCII characters...

As Jon Skeet explained in his answer , the string appears to be encoded in UTF-16 not UTF-8.

You also wrote

Furthermore, it adds an invalid character in the beginning of converted string.

This invalid character is almost certainly a byte order mark , a small prefatory sequence of bytes that indicates the specific encoding used in the stream. Given its presence, you can configure a StreamReader to detect the encoding specified by using the new StreamReader(Stream, true) constructor:

public static string Base64Decode(string base64EncodedData)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    using (var reader = new StreamReader(new MemoryStream(base64EncodedBytes), true))
    {
        return reader.ReadToEnd();
    }
}

Note that the StreamReader will consume the byte order mark during processing so it is not included in the returned string.

Alternatively, since your base64 data is actually XML, and XML contains its own encoding declaration , you could extract the byte array and parse it directly using an XmlReader :

public static XmlReader CreateXmlReaderFromBase64(string base64EncodedData, XmlReaderSettings settings = null)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    return XmlReader.Create(new MemoryStream(base64EncodedBytes), settings);
}

According to the docs , XmlReader.Create(Stream) will detect encoding as required:

The XmlReader scans the first bytes of the stream looking for a byte order mark or other sign of encoding. When encoding is determined, the encoding is used to continue reading the stream, and processing continues parsing the input as a stream of (Unicode) characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM