简体   繁体   English

MarkLogic:在指定元素之外的任何元素中搜索单词

[英]MarkLogic: search for word in any element apart from specified one(s)

How can I write a cts:query that efficiently searches for documents that contain a certain word, unless that word only occurs in a certain element. 我如何编写cts:query来有效地搜索包含某个单词的文档,除非该单词仅出现在某个元素中。

For example, I want to return documents containing the word "dog" but only if it's in any element apart from <title>. 例如,我想返回包含单词“ dog”的文档,但前提是该单词位于<title>之外的任何元素中。

So, given these documents: 因此,鉴于这些文件:

<document id="doc-1">
 <heading>foo</heading>
 <paragraph>foo foo foo</paragraph>
</document>

<document id="doc-2">
 <heading>bar dog</heading>
 <paragraph>bar bar bar</paragraph>
</document>

<document id="doc-3">
 <heading>foo dog</heading>
 <paragraph>dog bar bar</paragraph>
</document>

I want doc 3 returned. 我希望返回文档3。

This works: 这有效:

for $i in $doc-set
         where( doc($i)//*/text()[contains(normalize-space(lower-case(.)), "dog")] 
             [not(parent::title)] )

return $i ;

but it's very slow 但是很慢

If you always want to exclude the title element, then on the Admin UI go to the database configuration page, click on Word Query (on the left), click on the Excludes tab, and add that element. 如果您始终要排除title元素,则在Admin UI上转到数据库配置页面,单击Word Query(在左侧),单击Excludes选项卡,然后添加该元素。 That element will then be excluded from the index and cts:word-query() won't find terms there. 然后,该元素将从索引中排除,并且cts:word-query()在此处找不到术语。

For a more flexible solution, use the cts:not-in-query() function, AKA "mild not". 对于更灵活的解决方案,请使用cts:not-in-query()函数,也称为“温和不”。

cts:search(
  fn:doc(),
  cts:not-in-query(
    cts:word-query("dog"),
    cts:element-word-query(xs:QName("title"), "dog")
  )
)

Note that (as documented on the cts:not-in-query() page) you'll need to turn on the right position indexes. 请注意(如cts:not-in-query()页中所述),您需要打开正确的位置索引。 I think for this one, you'll want element word positions turned on, but run some tests. 我想为此,您需要打开元素词位置,但要运行一些测试。

Use cts:search : 使用cts:search

cts:search(//document, 
  cts:element-query((xs:QName('heading'), xs:QName('paragraph')),
    cts:word-query('dog', 'case-insensitive')))

Alternatively, you could create a field index and use XPath expressions to define the content you want to search. 另外,您可以创建一个字段索引并使用XPath表达式定义要搜索的内容。

Use cts:search Say 'Parent' is your Root element 使用cts:search说“父母”是您的根元素

cts:search(fn:doc()/Parent[name()!='Title'], cts:word-query("dog")) cts:search(fn:doc()/ Parent [name()!='Title'],cts:word-query(“ dog”))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM