简体   繁体   English

不区分大小写的搜索 - xpath

[英]case insensitive search - xpath

I'm trying to do a case-insensitive search on my XML document using the below XPath expression. 我正在尝试使用下面的XPath表达式对我的XML文档进行不区分大小写的搜索。 Apparently, I'm got it incorrectly, since the results are different.Hoping someone here can point out my mistake? 显然,我错了,因为结果不同。希望有人在这里可以指出我的错误?

I'm trying to get a count of all Obj elements under <Sect> where the <Header> value is Primary Objectives. 我正在尝试计算<Sect>下所有Obj元素,其中<Header>值是主要目标。 To get the count, I'm using the below expression which works great. 为了获得计数,我使用下面的表达式,它很有用。

Expression - without case sensitivity: Returns 31 nodes. 表达式 - 不区分大小写:返回31个节点。

("count(//TaggedPDF-doc//Part//Sect//Sect//Sect[contains(Header,\"Primary objectives\")]//OBJ)");

But I want to make "Primary Objectives" case insensitive. 但我想让“主要目标”不区分大小写。 So,I was trying to use Translate for that. 所以,我试图使用Translate。 Expression - adding translation to make "Primary Objectives" case insensitive. 表达式 - 添加翻译以使“主要目标”不区分大小写。

Returns 0 nodes. 返回0个节点。

$count = $dom->findvalue("count(//TaggedPDF-doc//Part//Sect//Sect//Sect[contains(H4,
         translate(\"Primary Objectives\", 
                   'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 
                   'abcdefghjiklmnopqrstuvwxyz')
         )
]//OBJ)");

Hoping someone here can point out where I got this wrong. 希望有人在这里可以指出我错在哪里。

Thanks in advance, Simak 谢谢,Simak

First off, you probably don't need all those // steps as a // allows for any number of levels of elements between the nodes named on either side - either enumerate the full path from the root using single / steps, or just use one // to search the whole tree. 首先,您可能不需要所有这些//步骤//允许在任一侧命名的节点之间的任意数量级别的元素 - 使用单个/步骤枚举来自根的完整路径,或者只是使用一个//搜索整棵树。

Secondly, you need to downcase the Header value you're comparing, not the fixed string you're comparing against. 其次,你需要将你正在比较的Header值缩减,而不是你要比较的固定字符串。 Try something more like 尝试更像的东西

count(//Sect[
          Header[
            contains(
              translate(
                .,
                'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
                'abcdefghijklmnopqrstuvwxyz'),
              'primary objectives'
            )
          ]
        ]//Obj)

which would give you the count of Obj elements that occur anywhere inside a Sect that has any Header child containing "primary objectives" (case-insensitive). 这将为您提供在Sect内任何位置发生的Obj元素的计数,该Sect包含任何包含“主要目标”的Header子项(不区分大小写)。 This is slightly different from 这与稍有不同

count(//Sect[contains(translate(Header, ....

in the case where the Sect contains more than one Header - the latter would only check the first Header in each Sect rather than looking for a match in any of them. Sect包含多个Header的情况下 - 后者只检查每个Sect第一个 Header ,而不是在其中任何一个中查找匹配。

If you've got access to an XPath 2.0 (or better) implementation - which is included in XQuery -, you could use 如果您可以访问XPath 2.0(或更好)的实现(包含在XQuery中),您可以使用

count(
  //TaggedPDF-doc//Part//Sect//Sect//Sect[
    contains(lower-case(H4), 'exclusion criteria')
  ]//OBJ
)

Perl interfaces for XPath 2.0 processors (actually XML databases with XQuery support) exist for eXist DB , BaseX , Saxon and lots of others . 对于eXist DBBaseXSaxon许多其他人来说 ,XPath 2.0处理器(实际上是支持XQuery的XML数据库)的Perl接口都存在。

You need to fold both strings: 你需要折叠两个字符串:

contains(translate(Header, '...', '...'), 'primary objectives')

Note that you can use 请注意,您可以使用

# Letters of "primary objectives"
'ABCEIJMOPRSTVY', 'abceijmoprstvy'

instead of the larger but still limited set 而不是更大但仍然有限的集合

 # Some of the latin letters
'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'

What you are trying to do is checking if content of H4 contains "Exclusion criters" converted to lowercase. 您要做的是检查H4的内容是否包含转换为小写的“排除标记”。

count = $dom->findvalue("count(//TaggedPDF-doc//Part//Sect//Sect//Sect[contains(H4, translate(\\"Exclusion criteria\\", 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz') ) count = $ dom-> findvalue(“count(// TaggedPDF-doc // Part // Sect // Sect // Sect [contains(H4,translate(\\”Exclusion criteria \\“,'ABCDEFGHJIKLMNOPQRSTUVWXYZ','abcdefghjiklmnopqrstuvwxyz')) )

]//OBJ)"); ] // OBJ)“);

it would be the same as doing: 这与做的一样:

count = $dom->findvalue("count(//TaggedPDF-doc//Part//Sect//Sect//Sect[contains(
        H4, \"exclusion criteria\"
     )
]//OBJ)");

What you want is translate the content of H4 to lowercase, and compare it to the lowercase version of what you search; 你想要的是将H4的内容翻译成小写,并将它与你搜索的小写版本进行比较; in this case \\"exclusion criteria\\" : 在这种情况下\\"exclusion criteria\\"

count = $dom->findvalue("count(//TaggedPDF-doc//Part//Sect//Sect//Sect[contains(
     translate(H4, 
         'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 
         'abcdefghjiklmnopqrstuvwxyz'), 
     \"exclusion criteria\"
     )
]//OBJ)");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM