简体   繁体   English

如何在Marklogic中从整个数据库中获取元素的最大值?

[英]How to get the maximum value of an element from entire database in Marklogic?

I want to get the Maximum value of <ID> from all the documents present inside the database. 我想从数据库中存在的所有文档中获取<ID>的最大值。

Sample Document- 样本文件

<root>
   <ID>3253523</ID>
   <value1>.....</value1>
   <value2>.....</value2>
   <value3>.....</value3>
   <value4>.....</value4>
    .....................
</root>

My database is having more than 1 million records and i want to fetch the ID which is having greatest value among all. 我的数据库有超过一百万条记录,我想获取其中具有最大价值的ID。

I can't use fn:last() because it won't give me the maximum value. 我不能使用fn:last()因为它不会给我最大值。

I need to use that value to create an INCREMENTAL COUNTER (The Maximum value will become my first value to the COUNTER). 我需要使用该值创建一个INCREMENTAL COUNTER(最大值将成为我对COUNTER的第一个值)。

Any Suggestions to fetch that value in an efficient way ? 有任何建议以有效的方式获取该价值吗? Because i can't do a cts:search() over 1 million records and then do orderby ascending and fetch the last value. 因为我无法对超过一百万条记录执行cts:search() ,然后orderby ascending进行orderby ascending并获取最后一个值。

You could add an element range index to the ID element then use the cts:values function to retrieve the first of the indexed values in descending order. 您可以将元素范围索引添加到ID元素,然后使用cts:values函数以降序检索第一个索引值。

Example: 例:

(: assuming a path range index for an int scalar at path '/root/ID' :)
for $i in 1 to 100
  let $doc := <root><ID>{$i}</ID></root>
  return
    xdmp:document-insert("/test/doc-" || $i, $doc, (), "test");

(cts:values(cts:path-reference("/root/ID"), (), "descending"))[1]

When you need to get the max, or some other aggregate of an element containing date, price, number, or other kind of value, the answer from Elijah is adequate. 当您需要获取包含日期,价格,数字或其他类型的值的元素的最大值或其他总计时,以利亚的答案就足够了。

For the specific case of sequential numbers, there is a bit more to it. 对于序号的特定情况,还有更多内容。 How to guarantee uniqueness across threads when you have parallel ingestion? 并行摄取时,如何保证线程之间的唯一性? It is a non-trivial problem, and we typically recommend against using sequential numbers for performance reasons because of that. 这不是一个简单的问题,因此出于性能原因,我们通常建议不要使用序列号。 Use random numbers instead. 请改用随机数。 It makes collision practically impossible, and prevents contention to derive the max + 1 ID. 它实际上使冲突成为不可能,并防止争夺最大+1 ID。

I've created a library that allows various ways of generating unique identifiers, and elaborates on the pros and cons of each: 我创建了一个库,该库允许使用各种方式生成唯一标识符,并详细说明每种标识符的优缺点:

https://github.com/grtjn/ml-unique#how-it-works https://github.com/grtjn/ml-unique#how-it-works

HTH! HTH!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM