简体   繁体   English

使用OR在MarkLogic上进行Xquery

[英]Xquery on MarkLogic using OR

This is a newbie MarkLogic question. 这是一个新手MarkLogic问题。 Imagine an xml structure like this, a condensation of my real business problem: 想象一下像这样的xml结构,这是我真正的业务问题的凝结:

<Person id="1">
  <Name>Bob</Name>
  <City>Oakland</City>
  <Phone>2122931022</Phone>
  <Phone>3123032902</Phone>
</Person>

Note that a document can and will have multiple Phone elements. 请注意,文档可以并且将具有多个Phone元素。

I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. 我要求从每个文档中返回信息,该文档的Phone元素与电话号码列表中的任何一个匹配。 The list may have a couple of dozen phone numbers in it. 该列表中可能包含几十个电话号码。

I have tried this: 我试过这个:

let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383") 
let $c := cts:word-query("3933849383") 
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)

which does the query properly, but it returns a sequence of Phone elements inside a Results element. 它正确地执行查询,但它返回Results元素内的一系列Phone元素。 My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. 我的目标是为每个匹配的文档返回所有NameCity元素以及Person元素的id属性。 Example: 例:

<results>
  <match id="18" phone="2123339494" name="bob" city="oakland"/>
  <match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>

So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. 所以我认为我需要某种形式的cts:search允许这个布尔功能,但也允许我指定返回每个文档的哪个部分。 At that point then I could further process the result with XPATH . 那时我可以用XPATH进一步处理结果。 I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. 我需要有效地执行此操作,例如,我认为返回文档uri的列表然后在循环中查询每个文档是不高效的。 Thanks! 谢谢!

Your approach is not as bad as you might think. 你的方法没有你想象的那么糟糕。 There are only a few changes necessary to make it work as you like. 只需进行一些更改即可使其按您的喜好工作。

First of all, you are better off using cts:element-value-query instead of cts:word-query . 首先,你最好使用cts:element-value-query而不是cts:word-query It will allow you to limit the searched values to a specific element. 它允许您将搜索到的值限制为特定元素。 It performs best when you add an element range index for that element, but it is not required. 当您为该元素添加元素范围索引时,它表现最佳,但不是必需的。 It can rely on the always present word index as well. 它也可以依赖于始终存在的单词索引。

Secondly, there is no need for the cts:or-query . 其次,不需要cts:or-query Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. cts:word-querycts:element-value-query函数(以及所有其他相关函数)都接受多个搜索字符串作为一个序列参数。 They are automatically treated as or-query . 它们被自动视为or-query

Thirdly, the phone numbers are your ' primary key ' in the result, so returning a list of all matching Phone elements is the way to go. 第三,电话号码都在结果的“ 主键 ”,因此返回所有匹配的手机元素的列表要走的路。 You just need to realize that the resulting Phone element are still aware of where they came from. 您只需要意识到生成的Phone元素仍然知道它们来自何处。 You can easily use XPath to navigate to parent and siblings. 您可以轻松使用XPath导航到父级和兄弟级。

Fourthly, there is nothing against looping over the search results. 第四,没有什么可以阻止搜索结果的循环。 It may sound a bit weird, but it doesn't cost much extra performance. 这可能听起来有点奇怪,但它不会花费太多额外的性能。 Actually, it is pretty much negligable, in MarkLogic Server that is. 实际上,在MarkLogic Server中它几乎可以忽略不计。 Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. 当您尝试返回许多结果(超过几千个)时,大多数性能可能会丢失,在这种情况下,大部分时间都会在序列化时丢失。 And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away. 如果您可能需要处理大量搜索结果,那么立即开始使用分页是明智之举。

To get what you ask, you could use the following code: 要获得您的要求,您可以使用以下代码:

<results>{
    for $phone in
        cts:search(
            doc()/Person/Phone,
            cts:element-value-query(
                xs:QName("Phone"),
                ("3738494044", "2373839383", "3933849383")
            )
        )
    return
        <match id="{data($phone/../@id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
}</results>

Best of luck. 祝你好运。

Here's what I would do: 这就是我要做的事情:

let $numbers := ("3738494044", "2373839383", "3933849383")
return
<results>{
    for $person in cts:search(/Person, cts:element-value-query(xs:QName("Phone"),$numbers))
    return
    <match id="{data($person/@id)}" name="{data($person/Name)}" city="{data($person/City)}">
      {
        for $phone in $person/Phone[cts:contains(.,$numbers)]
        return element phone {$phone}
      }
    </match>

} }

First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can. 首先,当将多个值传递给word-queryvalue-query及其兄弟时,会有一个隐式OR,并且可以从索引中更有效地解析此查询,所以尽可能这样做。

Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual. 其次,个人可能会匹配多个电话号码,因此您需要额外的内部循环才能有效地按个人分组。

I would not create a range index for this - no need, and it isn't necessarily faster. 我不会为此创建一个范围索引 - 没有必要,并且它不一定更快。 There are indexes for element values by default, so you can leverage those with element-value-query . 默认情况下,元素值有索引,因此您可以利用元素值查询

You could do all of this with the SearchAPI and a little XSLT . 您可以使用SearchAPI和一点XSLT完成所有这些操作。 That would make it easy to start combining names and numbers and other conditions in a single query. 这样可以轻松地在单个查询中开始组合名称和数字以及其他条件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM