[英]Remove starting and ending spaces from XML elements
How can I remove all spacing characters before and after a XML field? 如何在XML字段之前和之后删除所有间距字符?
<data version="2.0">
<field>
1
</field>
<field something=" some attribute here... ">
2
</field>
</data>
Notice that spacing before 1 and 2 and 'some attribute here...', I want to remove that with PHP. 请注意,在1和2之间的间距和'some attribute here ...',我想用PHP删除它。
if(($xml = simplexml_load_file($file)) === false) die();
print_r($xml);
Also the data doesn't appear to be string, I need to append (string) before each variable. 此外数据似乎不是字符串,我需要在每个变量之前追加(字符串)。 Why?
为什么?
You may want to use something like this: 你可能想要使用这样的东西:
$str = file_get_contents($file);
$str = preg_replace('~\s*(<([^>]*)>[^<]*</\2>|<[^>]*>)\s*~','$1',$str);
$xml = simplexml_load_string($xml,'SimpleXMLElement', LIBXML_NOCDATA);
I haven't tried this, but you can find more on this at http://www.lonhosford.com/lonblog/2011/01/07/php-simplexml-load-xml-file-preserve-cdata-remove-whitespace-between-nodes-and-return-json/ . 我没试过这个,但你可以在http://www.lonhosford.com/lonblog/2011/01/07/php-simplexml-load-xml-file-preserve-cdata-remove-whitespace找到更多相关信息。 -between-nodes-and-return-json / 。
Note that the spaces between the opening and closing brackets ( <x> _space_ </x>
) and the attributes ( <x attr=" _space_ ">
) are actually part of the XML document's data (in contrast with the spaces between <x> _space_ <y>
), so I would suggest that the source you use should be a bit less messy with spaces. 请注意,开括号和
<x> _space_ </x>
括号( <x> _space_ </x>
)之间的空格和属性( <x attr=" _space_ ">
<x> _space_ <y>
space <x> _space_ <y>
<x attr=" _space_ ">
)实际上是XML文档数据的一部分(与<x> _space_ <y>
之间的空格相反) <x> _space_ <y>
),所以我建议您使用的源应该对空格不那么混乱。
To do that in PHP you first have to convert the document into a DOMDocument so that you can address the nodes you want to normalize the whitespace within properly via DOMXPath . 要在PHP中执行此操作,首先必须将文档转换为DOMDocument,以便您可以通过DOMXPath正确地处理要在其中规范化空白的节点。 The (xpath in) SimpleXMLElement is too limited to access text-nodes precisely enough as it would be needed for this operation.
(xpath) SimpleXMLElement太受限制,无法正确访问文本节点,因为此操作需要它。
An Xpath-query to access all text-nodes that are within leaf-elements and all attributes is: 用于访问叶元素和所有属性内的所有文本节点的Xpath查询是:
//*[not(*)]/text() | //@*
Given that $xml
is a SimpleXMLElement you could do white-space normalization like in the following example: 鉴于
$xml
是一个SimpleXMLElement,你可以像下面的例子那样进行空格规范化:
$doc = dom_import_simplexml($xml)->ownerDocument;
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//*[not(*)]/text()|//@*') as $node) {
/** @var $node DOMText|DOMAttr */
$node->nodeValue = trim(preg_replace('~\s+~u', ' ', $node->nodeValue), ' ');
}
You could perhaps stretch this to all text-nodes ( as suggested in related Q&A ), but this might require document normalization under circumstance. 您可以将其扩展到所有文本节点( 如相关问答中所示 ),但这可能需要在环境下进行文档规范化。 As
text()
in Xpath does not differ between text-nodes and Cdata-sections, you might want to skip on these type of nodes ( DOMCdataSection ) or expand them into text-nodes when loading the document (use the LIBXML_NOCDATA
option for that) to achieve more useful results. 由于Xpath中的
text()
在文本节点和Cdata节之间没有区别,您可能希望跳过这些类型的节点( DOMCdataSection )或在加载文档时将它们扩展为文本节点(使用LIBXML_NOCDATA
选项 )实现更有用的结果。
Also the data doesn't appear to be string, I need to append (string) before each variable.
此外数据似乎不是字符串,我需要在每个变量之前追加(字符串)。 Why?
为什么?
Because it's an object of type SimpleXMLElement , if you want the string value of such an object (element), you need to cast it to string. 因为它是SimpleXMLElement类型的对象,如果你想要这样一个对象(元素)的字符串值,你需要将它转换为字符串。 See as well the following reference question:
请参阅以下参考问题:
And last but not least: don't trust print_r
or var_dump
when you use it on a SimpleXMLElement : it's not showing the truth. 最后但并非最不重要:当你在SimpleXMLElement上使用它时,不要信任
print_r
或var_dump
:它没有显示真相。 Eg you could override __toString()
which could also solve your issue: 例如,您可以覆盖
__toString()
,这也可以解决您的问题:
class TrimXMLElement extends SimpleXMLElement
{
public function __toString()
{
return trim(preg_replace('~\s+~u', ' ', parent::__toString()), ' ');
}
}
$xml = simplexml_load_string($buffer, 'TrimXMLElement');
print_r($xml);
Even though casting to string would normally apply (eg with echo
), the output of print_r
still would not reflect these changes. 即使转换为字符串通常也适用(例如使用
echo
), print_r
的输出仍然不会反映这些更改。 So better not rely on it, it can never show the whole picture. 所以最好不要依赖它,它永远无法展现整个画面。
Full example code to this answer ( Online Demo ): 此答案的完整示例代码( 在线演示 ):
<?php
/**
* Remove starting and ending spaces from XML elements
*
* @link https://stackoverflow.com/a/31793566/367456
*/
$buffer = <<<XML
<data version="2.0">
<field>
1
</field>
<field something=" some attribute here... ">
2 <![CDATA[ 34 ]]>
</field>
</data>
XML;
class TrimXMLElement extends SimpleXMLElement implements JsonSerializable
{
public function __toString()
{
return trim(preg_replace('~\s+~u', ' ', parent::__toString()), ' ');
}
function jsonSerialize()
{
$array = (array) $this;
array_walk_recursive($array, function(&$value) {
if (is_string($value)) {
$value = trim(preg_replace('~\s+~u', ' ', $value), ' ');
}
});
return $array;
}
}
$xml = simplexml_load_string($buffer, 'TrimXMLElement', LIBXML_NOCDATA);
print_r($xml);
echo json_encode($xml);
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);
$doc = dom_import_simplexml($xml)->ownerDocument;
$doc->normalizeDocument();
$doc->normalize();
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//*[not(*)]/text()|//@*') as $node) {
/** @var $node DOMText|DOMAttr|DOMCdataSection */
if ($node instanceof DOMCdataSection) {
continue;
}
$node->nodeValue = trim(preg_replace('~\s+~u', ' ', $node->nodeValue), ' ');
}
echo $xml->asXML();
Since simplexml_load_file()
reads data into an array, you could do something like this: 由于
simplexml_load_file()
将数据读入数组,因此您可以执行以下操作:
function TrimArray($input){
if (!is_array($input))
return trim($input);
return array_map('TrimArray', $input);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.