简体   繁体   中英

How can I convert an XElement to a byte array for a PutFile operation?

I need to convert a big XElement to a byte array so that it can be uploaded later to a fileshare. What is the correct method to call to do that?

Below you see the signature of a method fileShare.PutFile that is internal:

void PutFile(string folder, string fileName, byte[] content);

Then given an XElement xml , I tried converting it to a byte array by encoding its XElement.Value using Encoding.Default.GetBytes() as follows:

byte[] bytes = Encoding.Default.GetBytes(xml.Value);
fileShare.PutFile(folderName, blobName, bytes);

I am not so sure xml.Value (XElement.Value) is really what GetBytes method is really needing though. Is this correct?

To test this, I spun up a console app and put in some fake data. I did this for the XElement:

XElement root = new XElement("Root",
            new XElement("Child1", 1),
            new XElement("Child2", 2),
            new XElement("Child3", 3),
            new XElement("Child4", 4),
            new XElement("Child5", 5),
            new XElement("Child6", 6)
        );

Then I tried that line of code putting to a byte array

byte[] bytes = Encoding.Default.GetBytes(root.Value);

Well I guess I forgot that when I step over and see Autos that bytes variable is btye[6] and when I expand - I see that [0] = 49 and so on

Now this may not mean it is not working... or does it mean that? How can I interpret the contents of the bytes array, to check whether it is correct?

Firstly , using Encoding.Default is not recommended. From the docs :

Warning

Different computers can use different encodings as the default, and the default encoding can change on a single computer. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these reasons, using the default encoding is not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding . You could also use a higher-level protocol to ensure that the same format is used for encoding and decoding.

Secondly , XElement.Value returns

A String that contains all of the text content of this element. If there are multiple text nodes, they will be concatenated.

Thus if you upload the Value you will be stripping away the entire XML markup structure from your file leaving only the plain text. While you might want to do that, it seems very unlikely. If you compare the value with the string returned by XElement.ToString() the difference should be clear.

Instead , to convert the XML contents of your XElement (including both markup and text) to a byte array, it would be better to write your XElement directly to a MemoryStream using an appropriately configured XmlWriterSettings and return the byte array thereby created. The following extension method does the job:

public static partial class XNodeExtensions
{
    static Encoding DefaultEncoding { get; } = new UTF8Encoding(false); // Disable the BOM because XElement.ToString() does not include it.
    
    public static byte [] ToByteArray(this XNode node, SaveOptions options = default, Encoding encoding = default)
    {
        // Emulate the settings of XElement.ToString() and XDocument.ToString()
        // https://referencesource.microsoft.com/#System.Xml.Linq/System/Xml/Linq/XLinq.cs,2004
        // I omitted the XML declaration because XElement.ToString() omits it, but you might want to include it, depending upon your needs.
        var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Indent = (options & SaveOptions.DisableFormatting) == 0, Encoding = encoding ?? DefaultEncoding };
        if ((options & SaveOptions.OmitDuplicateNamespaces) != 0)
            settings.NamespaceHandling |= NamespaceHandling.OmitDuplicates;
        return node.ToByteArray(settings);
    }
    
    public static byte [] ToByteArray(this XNode node, XmlWriterSettings settings)
    {
        using var ms = new MemoryStream();
        using (var writer = XmlWriter.Create(ms, settings))
            node.WriteTo(writer);
        return ms.ToArray();
    }
}

Now you can format your XElement to a UTF8-encoded byte array by doing:

var bytes = root.ToByteArray();

The extension method has the added advantage that, if you really need to use some encoding other than UTF8, unsupported Unicode characters will be escaped rather than replaced with a fallback as explained in this answer to XmlDocument with Kanji text content is not encoded correctly to ISO-8859-1 using XmlTextWriter .

var bytes = root.ToByteArray(encoding : Encoding.Default);

To check for correctness, you could examine the contents of the byte array in the debugger or your console app by decoding it to a string as follows:

var resultString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(resultString);

Or with the default encoding:

var resultString = Encoding.Default.GetString(bytes);

You could also assert that the contents of the byte array are correct by parsing the contents back to a new XElement and checking that the result is semantically identical to the original by using XNode.DeepEquals() :

Assert.IsTrue(
    XNode.DeepEquals(root, 
                     XElement.Load(new StreamReader(new MemoryStream(bytes), encoding))));

Demo fiddle here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM