简体   繁体   English

C# Streamwriter - 编码问题

[英]C# Streamwriter - Problem with the encoding

I have some product data that I want to write into a csv file.我有一些产品数据要写入 csv 文件。 First I have a function that writes the header into the csv file:首先,我有一个 function 将 header 写入 csv 文件:

using(StreamWriter streamWriter = new StreamWriter(path))
{
    string[] headerContent = {"banana","apple","orange"};
    string header = string.Join(",", headerContent);
    streamWriter.WriteLine(header);
}

Another function goes over the products and writes their data into the csv file:另一个 function 检查产品并将其数据写入 csv 文件:

using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Open), Encoding.UTF8))
{
    foreach (var product in products)
    {
        await streamWriter.WriteLineAsync(product.ToString());
    }
}

When writing the products into the csv file and do it with FileMode.Open and Encoding.UTF8 , the encoding is set correctly into the file meaning that special characters in german or french get shown correctly.当将产品写入 csv 文件并使用FileMode.OpenEncoding.UTF8执行此操作时,编码已正确设置到文件中,这意味着德语或法语中的特殊字符可以正确显示。 But the problem here is that I overwrite my header when I do it like this.但这里的问题是,当我这样做时,我会覆盖我的 header。

The solution I tried was to not use FileMode.Open but to use FileMode.Append which works, but then for some reason the encoding just gets ignored.我尝试的解决方案是不使用FileMode.Open而是使用FileMode.Append ,但由于某种原因,编码被忽略了。

What could I do to append the data while maintaing the encoding?在保持编码的同时,我可以对 append 数据做些什么? And also why is this happening in the first place?还有为什么会发生这种情况?

EDIT:编辑:

Example with FileMode.Open : FileMode.Open示例:

Fußpflegecreme Fußpflegecreme

Example with FileMode.Append : FileMode.Append示例:

Fußpflegecreme泡沫霜

The important question here is: what does the file actually contain ;这里的重要问题是:文件实际包含什么; for example, if I use the following:例如,如果我使用以下内容:

using System.Text;

string path = "my.txt";
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Create), Encoding.UTF8))
{
    streamWriter.WriteLine("Fußpflegecreme 1");
}
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Append), Encoding.UTF8))
{
    streamWriter.WriteLine("Fußpflegecreme 2");
}
// this next line is lazy and inefficient; only good for quick tests
Console.WriteLine(BitConverter.ToString(File.ReadAllBytes(path)));

then the output is (re-formatted a little):然后 output 是(稍微重新格式化):

EF-BB-BF-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-31-0D-0A-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-32-0D-0A

The first line (note: there aren't any "lines" in the original hex) is the UTF-8 BOM;第一行(注意:原始十六进制中没有任何“行”)是 UTF-8 BOM; the second and third lines are the correctly UTF-8 encoded payloads.第二行和第三行是正确的 UTF-8 编码的有效载荷。 It would help if you could show the exact bytes that get written in your case.如果您可以显示在您的案例中写入的确切字节,将会有所帮助。 I wonder if the real problem here is that in your version, there is no BOM, but the rest of the data is correct.请问这里真正的问题是不是在你的版本中,没有BOM,但是数据的rest是正确的。 Some tools, in the absence of a BOM, will choose the wrong encoding.有些工具在没有 BOM 的情况下会选择错误的编码。 But also, some tools: in the presence of a BOM : will incorrectly show some garbage at the start of the file (and may also, because they're clearly not using the BOM: use the wrong encoding).而且,一些工具:在存在 BOM 的情况下:会在文件开头错误地显示一些垃圾(并且也可能,因为它们显然没有使用 BOM:使用错误的编码)。 The preferred option is: specify the encoding explicitly when reading the file, and use a tool that can handle the presence of absence of a BOM.首选方案是:在读取文件时明确指定编码,并使用可以处理 BOM 存在与否的工具。

Whether or not to include a BOM (especially in the case of UTF-8) is a complex question, and there are pros/cons of each - and there are tools that will work better, or worse, with each.是否包含 BOM(尤其是在 UTF-8 的情况下)是一个复杂的问题,每个问题都有优点/缺点 - 并且有一些工具可以更好地工作,也可以更糟。 A lot of UTF-8 text files do not include a BOM, but: there is no universal answer.很多UTF-8的文本文件包含BOM,但是:没有通用的答案。 The actual content is still correctly UTF-8 encoded whether or not there is a BOM - but how that is interpreted (in either case) is up to the specific tool that you're using to read the data (and how that tool is configured).无论是否有 BOM,实际内容仍然是正确的 UTF-8 编码 - 但如何解释(在任何一种情况下)取决于您用来读取数据的特定工具(以及该工具的配置方式)。

I think this will be solved once you explicitly choose the utf8 encoding when writing the header. This will prefix the file with a BOM.我认为,一旦您在编写 header 时明确选择 utf8 编码,就会解决这个问题。这将为文件添加 BOM 前缀。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM