繁体   English   中英

使用xslt从xml剥离CData

[英]strip off CData from xml using xslt

我正在使用xslt从以下xslt中提取数据。 无论如何有剥离CData。 目前,提取时还包括CData。

<Product>
<ExternalId><![CData[55037]]></ExternalId>
<Name><![CData[Reindeer Booties]]></Name>
<Description><![CData[Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.]]></Description>
<Brand>XYZ</Brand>
<CategoryExternalId>1_15_1</CategoryExternalId>
<ProductPageUrl><![CData[http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties]]></ProductPageUrl>
<ImageUrl><![CData[http://www.xyzimages.com/images/product/16S_550.jpg]]></ImageUrl>
<SwatchImageUrl><![CData[]]></SwatchImageUrl>
<Price>84.8000</Price>
<Wasprice>154.9500</Wasprice>
<ManufacturerPartNumber></ManufacturerPartNumber>
<EAN></EAN>
<Colours><![CData[blue-pink]]</Colours>
</Product>

我期待以下输出

<Product>
<ExternalId>55037</ExternalId>
<Name>Reindeer Booties></Name>
<Description>Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.</Description>
<Brand>XYZ</Brand>
<CategoryExternalId>1_15_1</CategoryExternalId>
<ProductPageUrl>http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties</ProductPageUrl>
<ImageUrl>http://www.xyzimages.com/images/product/16S_550.jpg</ImageUrl>
<SwatchImageUrl></SwatchImageUrl>
<Price>84.8000</Price>
<Wasprice>154.9500</Wasprice>
<ManufacturerPartNumber></ManufacturerPartNumber>
<EAN></EAN>
<Colours>blue-pink</Colours>
</Product>

真正的问题是您损坏了xml,应该修复错误的来源,而不是对结果进行修补。 CData不应放在尖括号标记中。 它应该以“!”开头 并以']'结尾。 以下正则表达式将修复该错误。

using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication28
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            string xml = File.ReadAllText(FILENAME);
            string pattern = @"(?'open'<)(?'cdata'!\[CData[^\>]+)(?'close'>)";
            string fixedXml = Regex.Replace(xml, pattern, "${cdata}");
            XDocument doc = XDocument.Parse(fixedXml);
        }
    }
}

您显示给我们的输入不是格式正确的XML,并且不能由XSLT处理:

  • 首先, CDATA节必须以<![CDATA[开头,而不是以<![CData[开头(XML区分大小写)。

  • 接下来,CDATA部分必须以]]>结尾。 输入的第14行缺少此结尾(您只有]]

修复这些缺陷后,即可获得格式正确的XML输入,例如:

XML格式

<Product>
    <ExternalId><![CDATA[55037]]></ExternalId>
    <Name><![CDATA[Reindeer Booties]]></Name>
    <Description><![CDATA[Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.]]></Description>
    <Brand>XYZ</Brand>
    <CategoryExternalId>1_15_1</CategoryExternalId>
    <ProductPageUrl><![CDATA[http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties]]></ProductPageUrl>
    <ImageUrl><![CDATA[http://www.xyzimages.com/images/product/16S_550.jpg]]></ImageUrl>
    <SwatchImageUrl><![CDATA[]]></SwatchImageUrl>
    <Price>84.8000</Price>
    <Wasprice>154.9500</Wasprice>
    <ManufacturerPartNumber></ManufacturerPartNumber>
    <EAN></EAN>
    <Colours><![CDATA[blue-pink]]></Colours>
</Product>

然后,您可以应用一个简单的仅用于身份转换的样式表:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

返回:

结果

<?xml version="1.0" encoding="UTF-8"?>
<Product>
   <ExternalId>550&lt;37</ExternalId>
   <Name>Reindeer Booties</Name>
   <Description>Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.</Description>
   <Brand>XYZ</Brand>
   <CategoryExternalId>1_15_1</CategoryExternalId>
   <ProductPageUrl>http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties</ProductPageUrl>
   <ImageUrl>http://www.xyzimages.com/images/product/16S_550.jpg</ImageUrl>
   <SwatchImageUrl/>
   <Price>84.8000</Price>
   <Wasprice>154.9500</Wasprice>
   <ManufacturerPartNumber/>
   <EAN/>
   <Colours>blue-pink</Colours>
</Product>

由于您使用的是C#,因此您完全不需要XSLT,而只需使用LINQ to XML。

var doc = XDocument.Load("test.xml");

foreach (var n in doc.DescendantNodes().OfType<XCData>().ToList())
{
    n.ReplaceWith(n.Value);
}

doc.Save("test2.xml");

当然,正如michael.hor257k指出的那样,您的输入XML应该格式正确。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM