简体   繁体   中英

MarkLogic: search for word in any element apart from specified one(s)

How can I write a cts:query that efficiently searches for documents that contain a certain word, unless that word only occurs in a certain element.

For example, I want to return documents containing the word "dog" but only if it's in any element apart from <title>.

So, given these documents:

<document id="doc-1">
 <heading>foo</heading>
 <paragraph>foo foo foo</paragraph>
</document>

<document id="doc-2">
 <heading>bar dog</heading>
 <paragraph>bar bar bar</paragraph>
</document>

<document id="doc-3">
 <heading>foo dog</heading>
 <paragraph>dog bar bar</paragraph>
</document>

I want doc 3 returned.

This works:

for $i in $doc-set
         where( doc($i)//*/text()[contains(normalize-space(lower-case(.)), "dog")] 
             [not(parent::title)] )

return $i ;

but it's very slow

If you always want to exclude the title element, then on the Admin UI go to the database configuration page, click on Word Query (on the left), click on the Excludes tab, and add that element. That element will then be excluded from the index and cts:word-query() won't find terms there.

For a more flexible solution, use the cts:not-in-query() function, AKA "mild not".

cts:search(
  fn:doc(),
  cts:not-in-query(
    cts:word-query("dog"),
    cts:element-word-query(xs:QName("title"), "dog")
  )
)

Note that (as documented on the cts:not-in-query() page) you'll need to turn on the right position indexes. I think for this one, you'll want element word positions turned on, but run some tests.

Use cts:search :

cts:search(//document, 
  cts:element-query((xs:QName('heading'), xs:QName('paragraph')),
    cts:word-query('dog', 'case-insensitive')))

Alternatively, you could create a field index and use XPath expressions to define the content you want to search.

Use cts:search Say 'Parent' is your Root element

cts:search(fn:doc()/Parent[name()!='Title'], cts:word-query("dog"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM