简体   繁体   中英

Use DTD to Define an Element as CDATA?

In short, is it possible to use a DTD to define an element as containing CDATA?

I'm calling a third party API that produces some invalid characters inside an element. Specifically, the data contains some HTML entities like ’ . When I attempt to parse this XML using SimpleXML, I of course get a parser error "Entity 'rsquo' not defined". Here's a simplistic example structure of what I'm dealing with:

<items>
    <item>
        <name>Jim Smith</name>
        <description>Jim&rsquo;s description breaks my parser</description>
    </item>
</items>

Since I don't have control to fix the API response... I've resorted to this dirty trick to inject a CDATA section inside the problem element just before I try to parse it:

$xml = str_replace("<description>", "<description><![CDATA[", $xml);
$xml = str_replace("</description>", "]]></description>", $xml);

This fixes the issue for me, but the overhead is probably too big, don't you think? The XML can be anywhere between 30K to 100K of data.

I'd rather use a DTD but for the life of me I can't find any specs that allow for defining CDATA (in the same way I can define PCDATA ). Below is what I'd like to do, but of course, it's invalid because of the '#CDATA' definition I'm trying to do:

<!DOCTYPE ITEMS [
    <!ELEMENT ITEMS (ITEM)>
    <!ELEMENT ITEM (NAME, DESCRIPTION)>
    <!ELEMENT NAME (#PCDATA)>
    <!ELEMENT DESCRIPTION (#CDATA)>
]>

Thanks for any insights!

它可以在SGML DTD中(例如HTML 4.01脚本元素 ),但不能在XML DTD中(因此XHTML 1.0的更改 )。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM