简体   繁体   中英

LINQ to XML ignores line breaks in attributes

According to this question:

Are line breaks in XML attribute values allowed?

line breaks in XML attributes are perfectly valid (although perhaps not recommended):

<xmltag1>
    <xmltag2 attrib="line 1
line 2
line 3">
    </xmltag2>
</xmltag1>

When I parse such XML using LINQ to XML ( System.Xml.Linq ), those line breaks are converted silently to space ' ' characters.

Is there any way to tell the XDocument.Load() parser to preserve those line breaks?

PS: The XML I'm parsing is written by third-party software, so I cannot change the way the line breaks are written.

If you want line breaks in attribute values to be preserved then you need to write them with character references eg

<foo bar="Line 1.&#10;Line 2.&#10;Line3."/>

as other wise the XML parser will normalize them to spaces, according to the XML specification http://www.w3.org/TR/xml/#AVNormalize .

[edit] If you want to avoid the attribute value normalization then loading the XML with a legacy XmlTextReader helps:

            string testXml = @"<foo bar=""Line 1.
Line 2.
Line 3.""/>";

            XDocument test;
            using (XmlTextReader xtr = new XmlTextReader(new StringReader(testXml)))
            {
                xtr.Normalization = false;
                test = XDocument.Load(xtr);
            }
            Console.WriteLine("|{0}|", test.Root.Attribute("bar").Value);

That outputs

|Line 1.
Line 2.
Line 3.|

解析时换行符不是空格(不是ASCII码32)如果单步执行每个字母,您会看到“空格”'是ASCII码10 = LF(LineFeed)(!!) - 所以换行仍然是如果您需要尝试在代码中使用ASCII 13替换它们...(文本框(窗体)不显示LF作为换行符)

According to MSDN :

Although XML processors preserve all white space in element content, they frequently normalize it in attribute values. Tabs, carriage returns, and spaces are reported as single spaces. In certain types of attributes, they trim white space that comes before or after the main body of the value and reduce white space within the value to single spaces. (If a DTD is available, this trimming will be performed on all attributes that are not of type CDATA.)

For example, an XML document might contain the following:

 <whiteSpaceLoss note1="this is a note." note2="this is a note."> 

An XML parser reports both attribute values as "this is a note." , converting the line breaks to single spaces.

I can't find anything about preserving whitespaces of attributes, but I guess it may be impossible according to this explanation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM