[英]How to validate CDATA section for an XML in PHP
我根據用戶輸入創建了一個XML。 其中一個xml節點有一個CDATA部分。 如果插入CDATA部分的其中一個字符是'特殊'(我認為是一個控制字符),則整個xml變為無效。
例:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'))
->appendChild($dom->createCDATASection(
"This is some text with a SOH char \x01."
));
$test = new DOMDocument;
$test->loadXml($dom->saveXML());
echo $test->saveXml();
會給
Warning: DOMDocument::loadXML(): CData section not finished
This is some text with a SOH cha in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): PCDATA invalid Char value 1 in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): Sequence ']]>' not allowed in content in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): Sequence ']]>' not allowed in content in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): internal errorExtra content at the end of the document in Entity, line: 2 in /newfile.php on line 17
<?xml version="1.0"?>
在PHP中有一個好方法確保CDATA部分有效嗎?
因為“\\ x01”不是可打印字符。 因此導致警告。你可以像這樣解決這個問題:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'))
->appendChild($dom->createCDATASection(
urlencode("This is some text with a SOH char \x01.")
));
$test = new DOMDocument;
$test->loadXml($dom->saveXML());
echo urldecode($test->saveXml());
使用戈登的答案,我做了:
/**
* Removes invalid characters from an HTML string
*
* @param string $content
*
* @return string
*/
function sanitize_html($content) {
if (!$content) return '';
$invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
return preg_replace($invalid_characters, '', $content);
}
使用如下:
看看simplexml_load_file
( http://php.net/manual/en/function.simplexml-load-file.php ) LIBXML_NOCDATA
選項( http://www.php.net/manual/en/libxml.constants.php )。 這很可能會回答你的問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.