简体   繁体   English

LINQ to XML忽略属性中的换行符

[英]LINQ to XML ignores line breaks in attributes

According to this question: 根据这个问题:

Are line breaks in XML attribute values allowed? 是否允许XML属性值中的换行符?

line breaks in XML attributes are perfectly valid (although perhaps not recommended): XML属性中的换行符完全有效(尽管可能不推荐):

<xmltag1>
    <xmltag2 attrib="line 1
line 2
line 3">
    </xmltag2>
</xmltag1>

When I parse such XML using LINQ to XML ( System.Xml.Linq ), those line breaks are converted silently to space ' ' characters. 当我使用LINQ to XML( System.Xml.Linq )解析这样的XML时,这些换行符将被静默转换为空格' '字符。

Is there any way to tell the XDocument.Load() parser to preserve those line breaks? 有没有办法告诉XDocument.Load()解析器保留这些换行符?

PS: The XML I'm parsing is written by third-party software, so I cannot change the way the line breaks are written. PS:我正在解析的XML是由第三方软件编写的,因此我无法改变换行符的写入方式。

If you want line breaks in attribute values to be preserved then you need to write them with character references eg 如果你想要保留属性值中的换行符,那么你需要用字符引用来编写它们,例如

<foo bar="Line 1.&#10;Line 2.&#10;Line3."/>

as other wise the XML parser will normalize them to spaces, according to the XML specification http://www.w3.org/TR/xml/#AVNormalize . 另外,根据XML规范http://www.w3.org/TR/xml/#AVNormalize,XML解析器会将它们规范化为空格。

[edit] If you want to avoid the attribute value normalization then loading the XML with a legacy XmlTextReader helps: [edit]如果要避免属性值规范化,那么使用旧版XmlTextReader加载XML有助于:

            string testXml = @"<foo bar=""Line 1.
Line 2.
Line 3.""/>";

            XDocument test;
            using (XmlTextReader xtr = new XmlTextReader(new StringReader(testXml)))
            {
                xtr.Normalization = false;
                test = XDocument.Load(xtr);
            }
            Console.WriteLine("|{0}|", test.Root.Attribute("bar").Value);

That outputs 那输出

|Line 1.
Line 2.
Line 3.|

解析时换行符不是空格(不是ASCII码32)如果单步执行每个字母,您会看到“空格”'是ASCII码10 = LF(LineFeed)(!!) - 所以换行仍然是如果您需要尝试在代码中使用ASCII 13替换它们...(文本框(窗体)不显示LF作为换行符)

According to MSDN : 根据MSDN

Although XML processors preserve all white space in element content, they frequently normalize it in attribute values. 虽然XML处理器保留了元素内容中的所有空白,但它们经常在属性值中对其进行规范化。 Tabs, carriage returns, and spaces are reported as single spaces. 选项卡,回车符和空格报告为单个空格。 In certain types of attributes, they trim white space that comes before or after the main body of the value and reduce white space within the value to single spaces. 在某些类型的属性中,它们修剪位于值主体之前或之后的空白区域,并将值中的空白区域减少为单个空格。 (If a DTD is available, this trimming will be performed on all attributes that are not of type CDATA.) (如果DTD可用,则将对所有非CDATA类型的属性执行此修剪。)

For example, an XML document might contain the following: 例如,XML文档可能包含以下内容:

 <whiteSpaceLoss note1="this is a note." note2="this is a note."> 

An XML parser reports both attribute values as "this is a note." XML解析器将两个属性值报告为"this is a note." , converting the line breaks to single spaces. ,将换行符转换为单个空格。

I can't find anything about preserving whitespaces of attributes, but I guess it may be impossible according to this explanation. 我找不到任何关于保留属性空格的内容,但我想根据这个解释可能是不可能的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM