简体   繁体   中英

How can I use XPath to perform a case-insensitive search and support non-english characters?

I am performing a search in an XML file, using the following code:

$result = $xml->xpath("//StopPoint[contains(StopName, '$query')]");

Where $query is the search query, and StopName is the name of a bus stop. The problem is, it's case sensitive.

And not only that, I would also be able to search with non-english characters like ÆØÅæøå to return Norwegian names.

How is this possible?

In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input.

For convenience, I would wrap it in a function like this:

function findStopPointByName($xml, $query) {
  $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"; // add any characters...
  $lower = "abcdefghijklmnopqrstuvwxyzæøå"; // ...that are missing

  $arg_stopname = "translate(StopName, '$upper', '$lower')";
  $arg_query    = "translate('$query', '$upper', '$lower')";

  return $xml->xpath("//StopPoint[contains($arg_stopname, $arg_query)");
}

As a sanitizing measure I would either completely forbid or escape single quotes in $query , because they will break your XPath string if they are ignored.

In XPath 2.0 you can use lower-case() function, which is unicode aware, so it'll handle non-ASCII characters fine.

contains(lower-case(StopName), lower-case('$query'))

To access XPath 2.0 you need XSLT 2.0 parser. For example SAXON . You can access it from PHP via JavaBridge.

Non-English names should not be a problem. Just add them to your XPath. (XML is defined as using Unicode).

As for case-insensitivity, ...

XPath 1.0 includes the following statement :

Two strings are equal if and only if they consist of the same sequence of UCS characters.

So even using explicit predicates on the local-name will not help.

XPath 2 includes functions to map case. Eg fn:upper-case


Additional: using XPath's translate function should allow case mapping to be faked in XPath 1, but the input will need to include every cased code point you and your users will ever need:

"test" = translate($inputString, "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ")

In addition:

$xml->xpath("//StopPoint[contains(StopName, '$query')]");

You will need to strip out any apostrophe characters from $query to avoid breaking your expression.

In XPath 2.0 you can double-up the quote being used in the delimiter to put that quote into a string literal, but in XPath 1.0 it's impossible to include the delimiter in the string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM