简体   繁体   English

如何使用 XPath 执行不区分大小写的搜索并支持非英文字符?

[英]How can I use XPath to perform a case-insensitive search and support non-english characters?

I am performing a search in an XML file, using the following code:我正在使用以下代码在 XML 文件中执行搜索:

$result = $xml->xpath("//StopPoint[contains(StopName, '$query')]");

Where $query is the search query, and StopName is the name of a bus stop.其中 $query 是搜索查询,StopName 是公交车站的名称。 The problem is, it's case sensitive.问题是,它区分大小写。

And not only that, I would also be able to search with non-english characters like ÆØÅæøå to return Norwegian names.不仅如此,我还可以使用诸如 ÆØÅæøå 之类的非英语字符进行搜索,以返回挪威名称。

How is this possible?这怎么可能?

In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input. In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input.

For convenience, I would wrap it in a function like this:为方便起见,我会将其包装在 function 中,如下所示:

function findStopPointByName($xml, $query) {
  $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"; // add any characters...
  $lower = "abcdefghijklmnopqrstuvwxyzæøå"; // ...that are missing

  $arg_stopname = "translate(StopName, '$upper', '$lower')";
  $arg_query    = "translate('$query', '$upper', '$lower')";

  return $xml->xpath("//StopPoint[contains($arg_stopname, $arg_query)");
}

As a sanitizing measure I would either completely forbid or escape single quotes in $query , because they will break your XPath string if they are ignored.作为一种消毒措施,我将完全禁止或转义$query中的单引号,因为如果它们被忽略,它们会破坏您的 XPath 字符串。

In XPath 2.0 you can use lower-case() function, which is unicode aware, so it'll handle non-ASCII characters fine.在 XPath 2.0 中,您可以使用lower-case() function,这是 unicode 感知的,因此它可以很好地处理非 ASCII 字符。

contains(lower-case(StopName), lower-case('$query'))

To access XPath 2.0 you need XSLT 2.0 parser.要访问 XPath 2.0,您需要 XSLT 2.0 解析器。 For example SAXON .例如撒克逊 You can access it from PHP via JavaBridge.您可以通过 JavaBridge从 PHP访问它。

Non-English names should not be a problem.非英文名称应该不是问题。 Just add them to your XPath.只需将它们添加到您的 XPath 即可。 (XML is defined as using Unicode). (XML 被定义为使用 Unicode)。

As for case-insensitivity, ...至于不区分大小写,...

XPath 1.0 includes the following statement : XPath 1.0 包含以下声明

Two strings are equal if and only if they consist of the same sequence of UCS characters.两个字符串相等当且仅当它们由相同的 UCS 字符序列组成。

So even using explicit predicates on the local-name will not help.因此,即使在本地名称上使用显式谓词也无济于事。

XPath 2 includes functions to map case. XPath 2 包括 map 案例的功能。 Eg fn:upper-case例如fn:大写


Additional: using XPath's translate function should allow case mapping to be faked in XPath 1, but the input will need to include every cased code point you and your users will ever need:附加:使用 XPath 的翻译 function 应该允许在 XPath 1 中伪造大小写映射,但输入需要包括您和您的用户将需要的每个大小写代码点:

"test" = translate($inputString, "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ")

In addition:此外:

$xml->xpath("//StopPoint[contains(StopName, '$query')]"); $xml->xpath("//StopPoint[contains(StopName, '$query')]");

You will need to strip out any apostrophe characters from $query to avoid breaking your expression.您需要从 $query 中删除任何撇号字符以避免破坏您的表达式。

In XPath 2.0 you can double-up the quote being used in the delimiter to put that quote into a string literal, but in XPath 1.0 it's impossible to include the delimiter in the string.在 XPath 2.0 中,您可以将分隔符中使用的引号加倍以将该引号放入字符串文字中,但在 XPath 1.0 中,不可能在字符串中包含分隔符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM