从XML元素中删除开始和结束空格

Question

How can I remove all spacing characters before and after a XML field? 如何在XML字段之前和之后删除所有间距字符？

<data version="2.0">

  <field> 

     1 

  </field>        

  <field something=" some attribute here... "> 

     2  

  </field>

</data>

Notice that spacing before 1 and 2 and 'some attribute here...', I want to remove that with PHP. 请注意，在1和2之间的间距和'some attribute here ...'，我想用PHP删除它。

if(($xml = simplexml_load_file($file)) === false) die();

print_r($xml);

Also the data doesn't appear to be string, I need to append (string) before each variable. 此外数据似乎不是字符串，我需要在每个变量之前追加（字符串）。 Why? 为什么？

Answer 1

You may want to use something like this: 你可能想要使用这样的东西：

$str = file_get_contents($file);
$str = preg_replace('~\s*(<([^>]*)>[^<]*</\2>|<[^>]*>)\s*~','$1',$str);
$xml = simplexml_load_string($xml,'SimpleXMLElement', LIBXML_NOCDATA);

I haven't tried this, but you can find more on this at http://www.lonhosford.com/lonblog/2011/01/07/php-simplexml-load-xml-file-preserve-cdata-remove-whitespace-between-nodes-and-return-json/ . 我没试过这个，但你可以在http://www.lonhosford.com/lonblog/2011/01/07/php-simplexml-load-xml-file-preserve-cdata-remove-whitespace找到更多相关信息。 -between-nodes-and-return-json / 。

Note that the spaces between the opening and closing brackets ( <x> _space_ </x> ) and the attributes ( <x attr=" _space_ "> ) are actually part of the XML document's data (in contrast with the spaces between <x> _space_ <y> ), so I would suggest that the source you use should be a bit less messy with spaces. 请注意，开括号和<x> _space_ </x>括号（ <x> _space_ </x> ）之间的空格和属性（ <x attr=" _space_ "> <x> _space_ <y> space <x> _space_ <y> <x attr=" _space_ "> ）实际上是XML文档数据的一部分（与<x> _space_ <y>之间的空格相反） <x> _space_ <y> ），所以我建议您使用的源应该对空格不那么混乱。

Answer 2

To do that in PHP you first have to convert the document into a DOMDocument so that you can address the nodes you want to normalize the whitespace within properly via DOMXPath . 要在PHP中执行此操作，首先必须将文档转换为DOMDocument，以便您可以通过DOMXPath正确地处理要在其中规范化空白的节点。 The (xpath in) SimpleXMLElement is too limited to access text-nodes precisely enough as it would be needed for this operation. （xpath） SimpleXMLElement太受限制，无法正确访问文本节点，因为此操作需要它。

An Xpath-query to access all text-nodes that are within leaf-elements and all attributes is: 用于访问叶元素和所有属性内的所有文本节点的Xpath查询是：

//*[not(*)]/text() | //@*

Given that $xml is a SimpleXMLElement you could do white-space normalization like in the following example: 鉴于$xml是一个SimpleXMLElement，你可以像下面的例子那样进行空格规范化：

$doc   = dom_import_simplexml($xml)->ownerDocument;
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//*[not(*)]/text()|//@*') as $node) {
    /** @var $node DOMText|DOMAttr */
    $node->nodeValue = trim(preg_replace('~\s+~u', ' ', $node->nodeValue), ' ');
}

You could perhaps stretch this to all text-nodes ( as suggested in related Q&A ), but this might require document normalization under circumstance. 您可以将其扩展到所有文本节点（如相关问答中所示），但这可能需要在环境下进行文档规范化。 As text() in Xpath does not differ between text-nodes and Cdata-sections, you might want to skip on these type of nodes ( DOMCdataSection ) or expand them into text-nodes when loading the document (use the LIBXML_NOCDATA option for that) to achieve more useful results. 由于Xpath中的text()在文本节点和Cdata节之间没有区别，您可能希望跳过这些类型的节点（ DOMCdataSection ）或在加载文档时将它们扩展为文本节点（使用LIBXML_NOCDATA选项）实现更有用的结果。

Also the data doesn't appear to be string, I need to append (string) before each variable. 此外数据似乎不是字符串，我需要在每个变量之前追加（字符串）。 Why? 为什么？

Because it's an object of type SimpleXMLElement , if you want the string value of such an object (element), you need to cast it to string. 因为它是SimpleXMLElement类型的对象，如果你想要这样一个对象（元素）的字符串值，你需要将它转换为字符串。 See as well the following reference question: 请参阅以下参考问题：

Forcing a SimpleXML Object to a string, regardless of context 无论上下文如何，都将SimpleXML对象强制为字符串

And last but not least: don't trust print_r or var_dump when you use it on a SimpleXMLElement : it's not showing the truth. 最后但并非最不重要：当你在SimpleXMLElement上使用它时，不要信任print_r或var_dump ：它没有显示真相。 Eg you could override __toString() which could also solve your issue: 例如，您可以覆盖__toString() ，这也可以解决您的问题：

class TrimXMLElement extends SimpleXMLElement
{
    public function __toString()
    {
        return trim(preg_replace('~\s+~u', ' ', parent::__toString()), ' ');
    }
}

$xml = simplexml_load_string($buffer, 'TrimXMLElement');

print_r($xml);

Even though casting to string would normally apply (eg with echo ), the output of print_r still would not reflect these changes. 即使转换为字符串通常也适用（例如使用echo ）， print_r的输出仍然不会反映这些更改。 So better not rely on it, it can never show the whole picture. 所以最好不要依赖它，它永远无法展现整个画面。

Full example code to this answer ( Online Demo ): 此答案的完整示例代码（在线演示）：

<?php
/**
 * Remove starting and ending spaces from XML elements
 *
 * @link https://stackoverflow.com/a/31793566/367456
 */

$buffer = <<<XML
<data version="2.0">

  <field>

     1

  </field>

  <field something=" some attribute here... ">

     2 <![CDATA[ 34 ]]>

  </field>

</data>
XML;

class TrimXMLElement extends SimpleXMLElement implements JsonSerializable
{
    public function __toString()
    {
        return trim(preg_replace('~\s+~u', ' ', parent::__toString()), ' ');
    }

    function jsonSerialize()
    {
        $array = (array) $this;

        array_walk_recursive($array, function(&$value) {
            if (is_string($value)) {
                $value  = trim(preg_replace('~\s+~u', ' ', $value), ' ');
            }
        });

        return $array;
    }
}

$xml = simplexml_load_string($buffer, 'TrimXMLElement', LIBXML_NOCDATA);

print_r($xml);
echo json_encode($xml);

$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);

$doc = dom_import_simplexml($xml)->ownerDocument;
$doc->normalizeDocument();
$doc->normalize();

$xpath = new DOMXPath($doc);
foreach ($xpath->query('//*[not(*)]/text()|//@*') as $node) {
    /** @var $node DOMText|DOMAttr|DOMCdataSection */
    if ($node instanceof DOMCdataSection) {
        continue;
    }
    $node->nodeValue = trim(preg_replace('~\s+~u', ' ', $node->nodeValue), ' ');
}

echo $xml->asXML();

Answer 3

Since simplexml_load_file() reads data into an array, you could do something like this: 由于simplexml_load_file()将数据读入数组，因此您可以执行以下操作：

function TrimArray($input){

    if (!is_array($input))
        return trim($input);

    return array_map('TrimArray', $input);
}

从XML元素中删除开始和结束空格

问题描述

3 个解决方案

解决方案1
2 2011-09-07 17:33:15

解决方案2
1 2015-08-03 17:47:10

解决方案3
1 2011-09-07 17:29:14

从XML元素中删除开始和结束空格

问题描述

3 个解决方案

解决方案1 2 2011-09-07 17:33:15

解决方案2 1 2015-08-03 17:47:10

解决方案3 1 2011-09-07 17:29:14

解决方案1
2 2011-09-07 17:33:15

解决方案2
1 2015-08-03 17:47:10

解决方案3
1 2011-09-07 17:29:14