简体   繁体   English

XQuery中的Marklogic整理序列

[英]Marklogic collate sequence in XQuery

Is there a way to modify the elements a sequence so only collated versions of the items are returned? 有没有办法修改序列中的元素,以便仅返回项目的整理版本?

let $currencies := ('dollar', 'Dollar', 'dollar ')
return fn:collated-only($currencies, "http://marklogic.com/collation/en/S1/T00BB/AS")

=> ('dollar', 'dollar', 'dollar')

The values that are stored in the range index (that feeds the facets) are literally the first value that was encountered that compared equal to the others. 实际上,存储在范围索引中的值(用于供给构面)是遇到的第一个与其他值相等的值。 (Because, the collation says you don't care...) (因为排序规则说您不在乎...)

You can get a long way by calling fn:replace(fn:lower-case(xdmp:diacritic-less(fn:normalize-unicode($str,"NFKC"))),"\\p{P}","") 您可以调用fn:replace(fn:lower-case(xdmp:diacritic-less(fn:normalize-unicode($str,"NFKC"))),"\\p{P}","")

This won't be exactly the same in that it overfolds some things and underfolds others, but it may be good for your purposes. 这不会完全相同,因为它会覆盖某些内容,而覆盖其他内容,但这可能对您有好处。

Is this the expected output? 这是预期的输出吗? There is no fn:collated-only function, so I'm assuming you're asking how to write such a function or whether there is such a function. 没有fn:collat​​ed-only函数,因此我假设您正在询问如何编写这样的函数或是否有这样的函数。

The thing is, there isn't a mapping from one string to another in collation comparisons, there is only a comparison algorithm ( the Unicode Collation Algorithm ) so there really is no canonical kind of string to return to you, and therefore no API to do so. 关键是,在归类比较中没有从一个字符串到另一个字符串的映射,只有一个比较算法( Unicode Collat​​ion Algorithm ),因此实际上没有规范的字符串可以返回给您,因此没有API这样做。

Stepping back, what is the problem you are actually trying to solve? 退一步,您实际上要解决的问题是什么? By the rules of that collation, "dollar" and "Dollar" are equivalent, and by using it you declare you don't care which form you use, so you could use either one. 根据该排序规则,“ dollar”和“ Dollar”是等效的,并且通过使用它声明不关心使用哪种格式,因此可以使用其中任何一种。

If these values are in XML elements and you have a range index using http://marklogic.com/collation/en/S1/T00BB/AS , you can do something like this: 如果这些值在XML元素中,并且您使用http://marklogic.com/collation/en/S1/T00BB/AS获得了范围索引,则可以执行以下操作:

let $ref := cts:element-reference(xs:QName("currency"), "collation=http://marklogic.com/collation/en/S1/T00BB/AS")
for $curr in cts:values($ref, (), "frequency-order")
return $curr || ": " || cts:frequency($curr)

This will produce results like: 这将产生如下结果:

"dollar: 15",
"euro: 12"

... and so on. ... 等等。 The collation will disregard the differences among your sample inputs. 排序规则将忽略样本输入之间的差异。 These results could be formatted however you want. 可以根据需要设置这些结果的格式。 Is that what you're looking to do? 这就是您想要做的吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM