简体   繁体   English

从XML元素中删除开始和结束空格

[英]Remove starting and ending spaces from XML elements

How can I remove all spacing characters before and after a XML field? 如何在XML字段之前和之后删除所有间距字符?

<data version="2.0">

  <field> 

     1 

  </field>        

  <field something=" some attribute here... "> 

     2  

  </field>

</data>

Notice that spacing before 1 and 2 and 'some attribute here...', I want to remove that with PHP. 请注意,在1和2之间的间距和'some attribute here ...',我想用PHP删除它。

if(($xml = simplexml_load_file($file)) === false) die();

print_r($xml);

Also the data doesn't appear to be string, I need to append (string) before each variable. 此外数据似乎不是字符串,我需要在每个变量之前追加(字符串)。 Why? 为什么?

You may want to use something like this: 你可能想要使用这样的东西:

$str = file_get_contents($file);
$str = preg_replace('~\s*(<([^>]*)>[^<]*</\2>|<[^>]*>)\s*~','$1',$str);
$xml = simplexml_load_string($xml,'SimpleXMLElement', LIBXML_NOCDATA);

I haven't tried this, but you can find more on this at http://www.lonhosford.com/lonblog/2011/01/07/php-simplexml-load-xml-file-preserve-cdata-remove-whitespace-between-nodes-and-return-json/ . 我没试过这个,但你可以在http://www.lonhosford.com/lonblog/2011/01/07/php-simplexml-load-xml-file-preserve-cdata-remove-whitespace找到更多相关信息。 -between-nodes-and-return-json /

Note that the spaces between the opening and closing brackets ( <x> _space_ </x> ) and the attributes ( <x attr=" _space_ "> ) are actually part of the XML document's data (in contrast with the spaces between <x> _space_ <y> ), so I would suggest that the source you use should be a bit less messy with spaces. 请注意,开括号和<x> _space_ </x>括号( <x> _space_ </x> )之间的空格和属性( <x attr=" _space_ "> <x> _space_ <y> space <x> _space_ <y> <x attr=" _space_ "> )实际上是XML文档数据的一部分(与<x> _space_ <y>之间的空格相反) <x> _space_ <y> ),所以我建议您使用的源应该对空格不那么混乱。

To do that in PHP you first have to convert the document into a DOMDocument so that you can address the nodes you want to normalize the whitespace within properly via DOMXPath . 要在PHP中执行此操作,首先必须将文档转换为DOMDocument,以便您可以通过DOMXPath正确地处理要在其中规范化空白的节点。 The (xpath in) SimpleXMLElement is too limited to access text-nodes precisely enough as it would be needed for this operation. (xpath) SimpleXMLElement太受限制,无法正确访问文本节点,因为此操作需要它。

An Xpath-query to access all text-nodes that are within leaf-elements and all attributes is: 用于访问叶元素和所有属性内的所有文本节点的Xpath查询是:

//*[not(*)]/text() | //@*

Given that $xml is a SimpleXMLElement you could do white-space normalization like in the following example: 鉴于$xml是一个SimpleXMLElement,你可以像下面的例子那样进行空格规范化:

$doc   = dom_import_simplexml($xml)->ownerDocument;
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//*[not(*)]/text()|//@*') as $node) {
    /** @var $node DOMText|DOMAttr */
    $node->nodeValue = trim(preg_replace('~\s+~u', ' ', $node->nodeValue), ' ');
}

You could perhaps stretch this to all text-nodes ( as suggested in related Q&A ), but this might require document normalization under circumstance. 您可以将其扩展到所有文本节点( 如相关问答中所示 ),但这可能需要在环境下进行文档规范化。 As text() in Xpath does not differ between text-nodes and Cdata-sections, you might want to skip on these type of nodes ( DOMCdataSection ) or expand them into text-nodes when loading the document (use the LIBXML_NOCDATA option for that) to achieve more useful results. 由于Xpath中的text()在文本节点和Cdata节之间没有区别,您可能希望跳过这些类型的节点( DOMCdataSection )或在加载文档时将它们扩展为文本节点(使用LIBXML_NOCDATA选项 )实现更有用的结果。


Also the data doesn't appear to be string, I need to append (string) before each variable. 此外数据似乎不是字符串,我需要在每个变量之前追加(字符串)。 Why? 为什么?

Because it's an object of type SimpleXMLElement , if you want the string value of such an object (element), you need to cast it to string. 因为它是SimpleXMLElement类型的对象,如果你想要这样一个对象(元素)的字符串值,你需要将它转换为字符串。 See as well the following reference question: 请参阅以下参考问题:


And last but not least: don't trust print_r or var_dump when you use it on a SimpleXMLElement : it's not showing the truth. 最后但并非最不重要:当你在SimpleXMLElement上使用它时,不要信任print_rvar_dump :它没有显示真相。 Eg you could override __toString() which could also solve your issue: 例如,您可以覆盖__toString() ,这也可以解决您的问题:

class TrimXMLElement extends SimpleXMLElement
{
    public function __toString()
    {
        return trim(preg_replace('~\s+~u', ' ', parent::__toString()), ' ');
    }
}

$xml = simplexml_load_string($buffer, 'TrimXMLElement');

print_r($xml);

Even though casting to string would normally apply (eg with echo ), the output of print_r still would not reflect these changes. 即使转换为字符串通常也适用(例如使用echo ), print_r的输出仍然不会反映这些更改。 So better not rely on it, it can never show the whole picture. 所以最好不要依赖它,它永远无法展现整个画面。


Full example code to this answer ( Online Demo ): 此答案的完整示例代码( 在线演示 ):

<?php
/**
 * Remove starting and ending spaces from XML elements
 *
 * @link https://stackoverflow.com/a/31793566/367456
 */

$buffer = <<<XML
<data version="2.0">

  <field>

     1

  </field>

  <field something=" some attribute here... ">

     2 <![CDATA[ 34 ]]>

  </field>

</data>
XML;

class TrimXMLElement extends SimpleXMLElement implements JsonSerializable
{
    public function __toString()
    {
        return trim(preg_replace('~\s+~u', ' ', parent::__toString()), ' ');
    }

    function jsonSerialize()
    {
        $array = (array) $this;

        array_walk_recursive($array, function(&$value) {
            if (is_string($value)) {
                $value  = trim(preg_replace('~\s+~u', ' ', $value), ' ');
            }
        });

        return $array;
    }
}

$xml = simplexml_load_string($buffer, 'TrimXMLElement', LIBXML_NOCDATA);

print_r($xml);
echo json_encode($xml);

$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);

$doc = dom_import_simplexml($xml)->ownerDocument;
$doc->normalizeDocument();
$doc->normalize();

$xpath = new DOMXPath($doc);
foreach ($xpath->query('//*[not(*)]/text()|//@*') as $node) {
    /** @var $node DOMText|DOMAttr|DOMCdataSection */
    if ($node instanceof DOMCdataSection) {
        continue;
    }
    $node->nodeValue = trim(preg_replace('~\s+~u', ' ', $node->nodeValue), ' ');
}

echo $xml->asXML();

Since simplexml_load_file() reads data into an array, you could do something like this: 由于simplexml_load_file()将数据读入数组,因此您可以执行以下操作:

function TrimArray($input){

    if (!is_array($input))
        return trim($input);

    return array_map('TrimArray', $input);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM