简体   繁体   中英

Marklogic: Find documents containing elements without a particular attribute (maybe many per document)

I have some data which looks something like this:

<wrapper>
  <inner a="1"/>
  <inner a="2" b="3"/>
</wrapper>

The attribute b may or may not be present on each inner element. My aim is to find all documents containing at least one inner element that doesn't have attribute b.*

This similar question proposes the answer:

cts:not-query(cts:element-attribute-value-query(xs:QName('inner'), xs:QName('b'), '*', ("wildcarded"))))

but that doesn't work, because some inner elements on the same document may have attribute b, and not-queries work on the entire fragment, so a mixed case like the example above would not be returned. Wrapping it in an element-query doesn't help, and cts:and-not-query seems to behave the same way.

I have also tried attacking the problem using co-occurrence/values functions to read the values of relevant attributes a, but that also seems to be impossible. It might have been possible with proximity settings on co-occurrences calls except there is no element text, so the attribute are indexed with the same word positions.

Are there any alternatives to the blunt xpath?

//inner[@a and not(@b)]

You can always make the xpath more complicated if simplicity isnt your goal. How about this one: (it more accurately answers the exact question of 'return all documents that contain 'innner' elements that do not have an atribute @b'

doc()[exists(//inner[not(@b)])]

I do not know how well this is optimized -- some xpath expressions optimize down to the equivalent cts: query and some do not.

There is another 'trick' involving combining cts expressions represented as maps. Take the results of 2 searches, use the options that return the results as a map, then you can use the operations on this page https://developer.marklogic.com/blog/im-a-map to do extremely efficient set operations (union, intersection, difference etc). When properly constructed, this technique can be as fast as 'native' cts searches --- the cts searches use the same general technique internally for resolving results.

Make the XPath a path range index. //inner[@a and not(@b)] , or if there's no element text, //inner[@a and not(@b)]/@a , then do

cts:path-range-query('//inner[@a and not(@b)]/@a','>','')

This happens to also allow us to efficiently answer the question of which @a values have a missing @b , using cts:values .

cts:not-in-query has the necessary behaviour to make this work where cts:and-not-query doesn't. Eg

cts:not-in-query(
  cts:element-query(xs:QName('inner'), cts:true-query()),
  cts:element-attribute-query(xs:QName('inner'), xs:QName('b'),'*','wildcarded')
)

Finds all 'inner' elements at positions that do not match the positions of 'inner' elements with attribute b.

Element position index must be enabled. Wildcard index must be enabled.

http://docs.marklogic.com/cts:not-in-query

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM