Why are these nul's appearing

Question

I used to write to a lot of different files using the following function

using (FileStream fs = new FileStream(Settings.PsLog, FileMode.Truncate, System.Security.AccessControl.FileSystemRights.Write, FileShare.ReadWrite, 1024, FileOptions.None, null))
{
    foreach (string line in checkList)
    {
        byte[] encodedText = Encoding.Unicode.GetBytes(line + Environment.NewLine);
        await fs.WriteAsync(encodedText, 0, line.Length);
    }
}

As this code was copy pasted all about, I decided to extract it to a more general function.

private static async Task WriteTextAsync(string filePath, string text)  
{
    byte[] encodedText = Encoding.Unicode.GetBytes(text + Environment.NewLine);
    using (FileStream sourceStream = new FileStream(filePath,
           FileMode.Append, FileAccess.Write, FileShare.Write,
           bufferSize: 1024, useAsync: true))
    {
        await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
    };
}

However after using the extracted version random NUL's are appended to the text

Where are these nulls coming from? I tried copying the filestream() settings 1 on 1 as well, but even then the NUL's occurred.

Answer 1

Your original code is broken.

When Encoding.Unicode is used, line.Length is not the same thing as encodedText.Length . When you try to write the data, you only write about half of it (on average).

Since that doesn't actually happen in your example, the most likely reason is that you're not actually using Encoding.Unicode , but rather either Encoding.UTF8 or one of the single-byte ANSI/ASCII encodings.

In either case, make sure you write as much bytes as there are to write. The number of characters is irrelevant. And make sure you use the proper encoding - there can only be one.

As a side-note, your code is going to be much slower than the original as well. This is most probably a poor trade-off. Instead, you might want to capture the whole foreach , and pass IEnumerable<string> instead of just string . If you really only need to write a single string in some cases, you can supply a params string overload or whatever suits you best. And do make sure that all cases are actually equivalent - this one surely isn't, since the original file is discarded in the original code, while it's only ever appended to in your code.

Answer 2

Perhaps you're writing UTF-16 output?

Elaboration:

In both the first and seconds blocks of code in your question you are using Encoding.Unicode , which encodes strings to little endian UTF-16 byte representations . Little endian byte order UTF-16 representations of ASCII characters such as 0 or G contain the usual ASCII byte as the first byte, then 0 ( NUL ) as the second byte of the character. This is the likely source of the NUL bytes in the output.

As for why NUL did not appear in the output from the first block of code I am not sure. Please post an input string which does not output NUL bytes for the first code block but which does output NUL output bytes for the second code block so that the cause of that issue can be confirmed.

Answer 3

Have you tried increasing bufferSize. You should see if there's a difference in where the nul starts inserting after changing.

Also not sure what happened between a for loop running through all the lines to just a single method producing results. You don't have multiple threads running at the same time going to this file right?

Why are these nul's appearing

Question

3 answers

solution1
4 ACCPTED 2016-05-14 22:20:15

solution2
0 2016-05-14 22:03:00

solution3
0 2016-05-14 22:13:37

Why are these nul's appearing

Question

3 answers

solution1 4 ACCPTED 2016-05-14 22:20:15

solution2 0 2016-05-14 22:03:00

solution3 0 2016-05-14 22:13:37

solution1
4 ACCPTED 2016-05-14 22:20:15

solution2
0 2016-05-14 22:03:00

solution3
0 2016-05-14 22:13:37