简体   繁体   中英

C# Streamwriter - Problem with the encoding

I have some product data that I want to write into a csv file. First I have a function that writes the header into the csv file:

using(StreamWriter streamWriter = new StreamWriter(path))
{
    string[] headerContent = {"banana","apple","orange"};
    string header = string.Join(",", headerContent);
    streamWriter.WriteLine(header);
}

Another function goes over the products and writes their data into the csv file:

using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Open), Encoding.UTF8))
{
    foreach (var product in products)
    {
        await streamWriter.WriteLineAsync(product.ToString());
    }
}

When writing the products into the csv file and do it with FileMode.Open and Encoding.UTF8 , the encoding is set correctly into the file meaning that special characters in german or french get shown correctly. But the problem here is that I overwrite my header when I do it like this.

The solution I tried was to not use FileMode.Open but to use FileMode.Append which works, but then for some reason the encoding just gets ignored.

What could I do to append the data while maintaing the encoding? And also why is this happening in the first place?

EDIT:

Example with FileMode.Open :

Fußpflegecreme

Example with FileMode.Append :

Fußpflegecreme

The important question here is: what does the file actually contain ; for example, if I use the following:

using System.Text;

string path = "my.txt";
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Create), Encoding.UTF8))
{
    streamWriter.WriteLine("Fußpflegecreme 1");
}
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Append), Encoding.UTF8))
{
    streamWriter.WriteLine("Fußpflegecreme 2");
}
// this next line is lazy and inefficient; only good for quick tests
Console.WriteLine(BitConverter.ToString(File.ReadAllBytes(path)));

then the output is (re-formatted a little):

EF-BB-BF-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-31-0D-0A-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-32-0D-0A

The first line (note: there aren't any "lines" in the original hex) is the UTF-8 BOM; the second and third lines are the correctly UTF-8 encoded payloads. It would help if you could show the exact bytes that get written in your case. I wonder if the real problem here is that in your version, there is no BOM, but the rest of the data is correct. Some tools, in the absence of a BOM, will choose the wrong encoding. But also, some tools: in the presence of a BOM : will incorrectly show some garbage at the start of the file (and may also, because they're clearly not using the BOM: use the wrong encoding). The preferred option is: specify the encoding explicitly when reading the file, and use a tool that can handle the presence of absence of a BOM.

Whether or not to include a BOM (especially in the case of UTF-8) is a complex question, and there are pros/cons of each - and there are tools that will work better, or worse, with each. A lot of UTF-8 text files do not include a BOM, but: there is no universal answer. The actual content is still correctly UTF-8 encoded whether or not there is a BOM - but how that is interpreted (in either case) is up to the specific tool that you're using to read the data (and how that tool is configured).

I think this will be solved once you explicitly choose the utf8 encoding when writing the header. This will prefix the file with a BOM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM