简体   繁体   中英

XML XPath ignore Case and Whitespace

I have done searching and still don't have a clear view with this.
I got XML that is save in my local xml.xml

<ITEM NAME='Sample'>
   ..some other node here
</ITEM >
<ITEM NAME='SamPlE lorem'>
   ..some other node here
</ITEM >
<ITEM  NAME='Sam Ple lorem ipsum'>
   ..some other node here
</ITEM >
<ITEM  NAME='sample'>
   ..some other node here
</ITEM >
<ITEM  NAME='SAMPLE'>
   ..some other node here
</ITEM >

$xmlfile = 'localhost/project/xml.xml'
$xml = simplexml_load_file($xmlfile);

I need to search this string "sample" ignoring the case sensitivity and the white space so that I may get TRUE on every node of above xml, all I have so far is this

 //ITEM is not a parent node thats why I am using this line 
 //to lead me to certain part of my xml
 //that match my contain search

 $string = "sample";
 $result = $xml->xpath("//ITEM[contains(@NAME, '$string')");

but I got the result of

<ITEM  NAME='sample'>
   ..some other node here
</ITEM >

I also try the translate function said in this How do i make Xpath search case insensitive but I always got an error.

SimpleXML's Xpath is not very fitting to do the whole job. Especially case-insensitivity search is pretty akward - and you are actually confronted with too much in the related question .

One way to simplify the job is to divide it. Eg first get the list of all interesting elements/attributes, then filter them and then get all their parent elements.

This can be easily done by turning the xpath result (which is an array) into an Iterator

$string   = "sample";
$names    = $xml->xpath('//ITEM/@NAME');
$filtered = new LaxStringFilterIterator($names, $string);
$items    = new SimpleXMLParentNodesIterator($filtered);

foreach ($items as $item) {
    echo $item->asXML(), "\n";
}

This then will output the searched nodes (exemplary):

<ITEM NAME="Sample">
   ..some other node here
</ITEM>
<ITEM NAME="SamPlE lorem">
   ..some other node here
</ITEM>
<ITEM NAME="Sam Ple lorem ipsum">
   ..some other node here
</ITEM>
<ITEM NAME="sample">
   ..some other node here
</ITEM>
<ITEM NAME="SAMPLE">
   ..some other node here
</ITEM>

And the separated solution for the filtering the array based on the string value:

/**
 * Class LaxStringFilterIterator
 *
 * Search for needle in case-insensitive manner on a subject
 * with spaces removed.
 */
class LaxStringFilterIterator extends FilterIterator
{
    private $quoted;

    /**
     * @param Traversable|Array|Object $it
     * @param string $needle
     */
    public function __construct($it, $needle) {
        parent::__construct($it instanceof Traversable ? new IteratorIterator($it) : new ArrayIterator($it));
        $this->quoted = preg_quote($needle);
    }

    public function accept() {
        $pattern = sprintf('/%s/i', $this->quoted);
        $subject = preg_replace('/\s+/', '', trim(parent::current()));
        return preg_match($pattern, $subject);
    }
}

And the parent nodes decorator:

/**
 * Class SimpleXMLParentNodesIterator
 *
 * Return parent nodes instead of current SimpleXMLElement Nodes,
 * for example the element of an attribute.
 */
class SimpleXMLParentNodesIterator extends IteratorIterator
{
    public function current() {
        $current = parent::current();
        list($parent) = $current[0]->xpath('..');
        return $parent;
    }
}

If you want to get every @Name which starts-with 'sample' without taking care of cases and space between, you must use :

//ITEM[matches(normalize-space(@NAME), '^[sS]\s?[aA]\s?[mM]\s?[pP]\s?[lL]\s?[eE]')]

output: all items

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM