简体   繁体   English

我认为这是某种编码问题

[英]I think this is some kind of encoding problem

I have two computers. 我有两台电脑。 Both running WinXP SP2 (I don't really know ho similar they are beyond that). 两者都运行WinXP SP2(我真的不知道它们是否超出了它们)。 I am running MS Visual C# 2008 express edition on both and that's what I'm currently using to program. 我在两者上都运行MS Visual C#2008 Express Edition,这就是我当前正在使用的程序。

I made an application that loads in an XML file and displays the contents in a DataGridView. 我制作了一个加载XML文件并在DataGridView中显示内容的应用程序。

The first line of my xml file is: 我的xml文件的第一行是:

<?xml version="1.0" encoding="utf-8"?>

...and really... it's utf-8 (at least according to MS VS C# when I just open the file there). ...而且实际上...它是utf-8(至少在我刚在那里打开文件时,至少根据MS VS C#)。

I compile the code and run it on one computer, and the contents of my DataGridView appears normal. 我编译代码并在一台计算机上运行它,并且DataGridView的内容似乎正常。 No funny characters. 没有有趣的人物。 I compile the code and run it on the other computer (or just take the published version from computer #1 and install it on computer #2 - I tried this both ways) and in the datagridview, where there are line breaks/new lines in the xml file, I see funny square characters. 我编译代码并在另一台计算机上运行它(或者只是从1号计算机上获取发布的版本,然后将其安装在2号计算机上-我尝试了这两种方式),并且在datagridview中,其中存在换行符/换行符xml文件,我看到有趣的方形字符。

I'm a novice to encoding... so the only thing I really tried to troubleshoot was to use that same program to write the contents of my xml to a new xml file (but I'm actually writing it to a text file, with the xml tags in it) since the default writing to a text file seems to be utf-8. 我是编码的新手...所以我真正想解决的唯一问题是使用同一程序将xml的内容写入新的xml文件(但实际上是将其写入文本文件,带有xml标记),因为默认写入文本文件似乎是utf-8。 Then I read this new file back in to my program. 然后,我将这个新文件读回到程序中。 I get the same results. 我得到相同的结果。

I don't know what else to do or how to troubleshoot this or what I might fundamentally be doing wrong in the first place. 我不知道该怎么办,或者如何解决该问题,或者一开始我可能根本做错了什么。

-Adeena -阿德娜

This doesn't have to do with UTF-8 or character encodings - this problem has to do with line endings . 这与UTF-8或字符编码无关-这个问题与行尾有关 In Windows, each line of a text file ends in the two characters carriage-return (CR) and newline (LF, for line feed), which are code points U+000D and U+000A respectively. 在Windows中,文本文件的每一行以两个字符回车符(CR)和换行符(换行符为LF)结尾,它们分别是代码点U + 000D和U + 000A。 In ASCII and UTF-8, these are encoded as the two bytes 0D 0A . 在ASCII和UTF-8中,它们被编码为两个字节0D 0A Most non-Windows systems, including Linux and Mac OS X, on the other hand, uses just a newline character to signal end-of-line, so it's not uncommon to see line ending problems when transferring text files between Windows and non-Windows systems. 另一方面,大多数非Windows系统(包括Linux和Mac OS X)仅使用换行符来表示行尾,因此在Windows和非Windows之间传输文本文件时看到行尾问题并不罕见。系统。

However, since you're using just Windows on both systems, this is more of a mystery. 但是,由于在两个系统上都只使用Windows,所以这更是一个谜。 One application is correctly interpreting the CRLF combination as a newline, but the other application is confused by the CR. 一个应用程序正确地将CRLF组合解释为换行符,但另一应用程序被CR混淆了。 Carriage returns are not printable characters, so it replaces the CR with a placeholder box, which is what you see; 回车符不是可打印的字符,因此它将用占位符框替换CR,这是您所看到的; it then correctly interprets the line feed as the end-of-line. 然后,它将换行正确地解释为行尾。

The square usually appears when you use different types of newlines. 当您使用不同类型的换行符时,通常会显示正方形。

  • Linux - (0A) LF Linux- (0A)低频
  • Win - (0D0A) CRLF 赢- (0D0A) CRLF
  • Mac - (0D) CR (0D) CR

The app was probably created using 1 type and the running app is expecting another. 该应用程序可能是使用1种类型创建的,正在运行的应用程序期望使用另一种类型。


Check out Environment.NewLine 查看Environment.NewLine

And, you might try this: (no guarantees -- I don't write much C#) 而且,您可以尝试以下操作:(不保证-我编写的C#不多)

strInput = Regex.Replace(strInput, "\\r?\\n?", Environment.NewLine)

I'm not sure of the cause of your problem, but one solution would be to to just strip out the carriage returns from your strings. 我不确定造成问题的原因,但是一种解决方案是只从字符串中去除回车符。 For every string you add, just call TrimEnd(null) on it to remove trailing whitespace: 对于添加的每个字符串,只需在其上调用TrimEnd(null)即可删除结尾的空格:

newrow["topic"] = att1.ToString().TrimEnd(null);

If your strings might end in other whitespace (ie spaces or tabs) and you want to keep those, then just pass an array containing only the carriage return character to TrimEnd : 如果您的字符串可能以其他空格结尾(即空格或制表符)而您想保留它们,则只需将仅包含回车符的数组传递给TrimEnd

newrow["topic" = att1.ToString().TrimEnd(new Char[]{'\r'});

Disclaimer: I am not a C# programmer; 免责声明:我不是C#程序员。 the second statement may be syntactically incorrect 第二句在语法上可能不正确

@ Adam: Sorry! // @亚当:对不起! Missed your earlier statement. 错过了您先前的声明。

To load the document into the program and display in the DataGridView, I am currently doing (I say "currently", because I tried other things like use XDocument instead of Xelement): 要将文档加载到程序中并显示在DataGridView中,我目前正在做(我说“当前”,因为我尝试了其他类似使用XDocument而不是Xelement的操作):

XElement xe1 = XElement.Load(filePath);

DataTable myTable = new DataTable();
myTable = mkTable();   // calls a function that makes the table
var _categories = (from p1 in xe1.Descendants("category") select p1);
int numCat = _categories.Count();
int i = 0;

while (i < numCat)
{
    DataRow newrow;
    newrow = myTable.NewRow();

    if (_categories.ElementAt(i).Parent.Name == "topic")
    {
        string att1 = _categories.ElementAt(i).Parent.Attribute("name").Value.ToString();
        newrow["topic"] = att1.ToString();
    }
    // repeat the above for the different things in my document
    myTable.Rows.Add(newrow);

    i++;
}
myDataSet.Merge(myTable);
bindingSourceIn.DataSource = myDataSet;
myDataGridView.DataSource = bindingSourceIn;
myDataGridView.DataMember = "xmlthing";

(obviously things are a little abbreviated here... ie, my bindingsource/datagridview etc is declared elsewhere.... but hopefully this is enough to make sense) (显然,这里的内容略有缩略...即,我的bindingsource / datagridview等在其他地方声明了...。但是希望这足以使之有意义)

-Adeena -阿德娜

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM