简体   繁体   English

xQuery - 如何根据 XML 文档中的元素数量进行查询?

[英]xQuery - How to query based on number of elements in XML document?

I'm still new to xQuery / MarkLogic and I'm having trouble understanding how to query based on the number of elements in the XML document.我还是 xQuery / MarkLogic 的新手,我无法理解如何根据 XML 文档中的元素数量进行查询。 For example, imagine I have a database of XML documents roughly similar to the following:例如,假设我有一个包含 XML 个文档的数据库,大致如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>

  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>

  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
  </book>

  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
  
</bookstore>

As you can see in book[2], price is missing.正如您在书 [2] 中所见,缺少价格。 Most documents in the database I'm working with would either have the child element price for each book or no price element attached to any of the book elements.我正在使用的数据库中的大多数文档要么具有本书的子元素价格,要么没有附加到任何书籍元素的价格元素。 My goal is to find only the documents where some of the child elements are missing (like the above XML);我的目标是只找到缺少某些子元素的文档(如上面的 XML); and ignore the documents where either all the child elements exist or where none of the child elements exist.并忽略所有子元素都存在或不存在子元素的文档。 So in my head the logic is something along the lines of "return results where the number of price elements is < the number of book elements AND > 0."因此,在我看来,逻辑类似于“返回价格元素数量 < 书籍元素数量 AND > 0 的结果”。

The best I can do so far is the following query:到目前为止我能做的最好的是以下查询:

let $some-docs := cts:search(fn:collection('/my/collection'), 
                               cts:and-query((
                                  cts:element-query(xs:QName("book"), cts:true-query()),
                                  cts:not-query(cts:element-query(xs:QName("price"), cts:true-query()))
)))
                             
return (xdmp:node-uri($some-docs))

But this obviously only returns documents where book elements exist and no price elements exist.但这显然只返回存在 book 元素且不存在price 元素的文档。 I need a way of indicating I want the documents where the price element exists, but is missing for some books.我需要一种方法来表明我想要价格元素存在的文档,但某些书籍缺少该元素。

I prefer a solution that is using the cts:search function, but any help is appreciated我更喜欢使用 cts:search function 的解决方案,但我们将不胜感激

I need a way of indicating I want the documents where the price element exists, but is missing for some books.我需要一种方法来表明我想要价格元素存在的文档,但某些书籍缺少该元素。

So basically you need to find documents that have both <bookstore><book><price/></book></bookstore> and ones missing the child <price/> element?所以基本上你需要找到既有<bookstore><book><price/></book></bookstore>又缺少子<price/>元素的文档?

The simplest thing to do is modify the existing documents using a tool like CORB to include an element indicating that document matches your criteria or perhaps place them in a distinct collection.最简单的做法是使用 CORB 等工具修改现有文档,以包含一个元素,指示该文档符合您的标准,或者可能将它们放在不同的集合中。 Then just use CTS to return documents with that added indicator.然后只需使用 CTS 返回带有该添加指示符的文档。

If you don't want to touch the dataset you could create a field range index on /bookstore/book/price and /bookstore/book[not(./price)]/title .如果您不想触摸数据集,您可以在/bookstore/book/price/bookstore/book[not(./price)]/title上创建一个字段范围索引。 Then you just need to query for documents where both indexes are present with something like:然后您只需要查询两个索引都存在的文档,例如:

cts:and-query((
  cts:field-word-query("field1", "*", ("wildcarded")),
  cts:field-word-query("field2", "*", ("wildcarded"))
))

Getting the count of elements within a document isn't something that is exposed and available for a query.获取文档中元素的计数不是公开的并且可用于查询。 You could apply a predicate filter and test if there are any book that do not have a price for the docs returned from the search for those bookstore docs:您可以应用谓词过滤器并测试是否有任何book没有从搜索这些bookstore文档返回的文档的price

cts:search(fn:collection('/my/collection'), 
  cts:element-query(xs:QName("book"), cts:true-query())
)[bookstore/book[not(price)]] 
return results where the number of price elements is < the number of book elements AND > 0

You could write not(count(//price) = (count(//book), 0))你可以写not(count(//price) = (count(//book), 0))

or perhaps也许

empty(//price) or empty(//book[not(price)]

It seems a very strange query though.不过,这似乎是一个非常奇怪的查询。 Perhaps you should be using a schema for validation?也许您应该使用模式进行验证?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM