简体   繁体   English

排序集的目的是什么?

[英]What is the purpose of sorted sets?

Clojure has a function sorted-set which creates a PersistentTreeSet object. Clojure有一个函数sorted-set ,用于创建PersistentTreeSet对象。 As the name implies, sorted-set creates a sorted collection of unique objects. 顾名思义, sorted-set创建了一个排序的唯一对象集合。

When are sorted sets useful? 什么时候排序集有用吗? When is it better to use sorted-set than sort and distinct ? 什么时候使用sorted-setsortdistinct更好?

=> (apply sorted-set [2 2 1 1 3 3])
#{1 2 3}
=> (sort (distinct [2 2 1 1 3 3]))
(1 2 3)

Sorted sets are useful when you need set semantics – fast contains? 当您需要设置语义时,排序集很有用 - 快速contains? , conj and disj (= element removals), as explained by Leon – and traversals in a well-defined order. ,如Leon所解释的那样, conjdisj (=元素删除) - 以及定义良好的顺序遍历。 In the case of the built-in sorted sets (and maps), ordered traversals are possible over the entire set ( seq , rseq ) and any "subrange" ( subseq , rsubseq ) between two keys, inclusive or exclusive. 在内置有序集(和映射)的情况下,有序遍历可以在整个集合( seqrseq )和两个密钥之间的任何“子范围”( subseqrsubseq ),包括或排除。

If you're willing to reach for out-of-core collections, the Contrib library data.avl (of which I am the author and maintainer) offers a flavour of sorted sets and maps with additional functionality – nth for access to set elements by rank, rank-of for discovering the rank of an element in the set, nearest neighbour queries, and "subrange" and split-like operations that return completely functional subsets of the input collection (think subseq returning a completely functional subset of the original, not just a seq, without holding on to any elements of the original not present in the subset for the purposes of GC). 如果你愿意接触核心外的集合,那么Contrib库data.avl (我是作者和维护者)提供了一系列带有附加功能的排序集和映射 - 用于访问集合元素的nth rank, rank-of用于发现集合中元素的等级,最近邻居查询,以及返回输入集合的完整功能子集的“子范围”和类似分割的操作(认为subseq返回原始的完整功能子集 ,不仅仅是seq,没有为了GC的目的而保留子集中不存在的原始元素。 All of these operate in O(log n) time worst-case, just like the standard sorted set operations. 所有这些都在最坏情况下的O(log n)时间内运行,就像标准的有序集合操作一样。

If you only need contains? 如果你只需要contains? + conj + disj , you'll probably want to use hash sets instead, since they tend to deliver better performance for these operations. + conj + disj ,你可能会想要使用哈希集,因为它们往往会为这些操作提供更好的性能。 It is worth noting, however, that if you anticitpate adding inputs from a possibly malicious outside source to your sets, you may want to go with sorted sets even if you don't care about the order. 但是,值得注意的是,如果您反对从可能恶意的外部源添加输入到您的集合,即使您不关心订单,您也可能希望使用有序集合。 This is because hash sets' performance degrades to O(n) in the presence of hash collisions (which an adversary could force, the hash function in use being deterministic and fixed in advance), whereas sorted sets' O(log n) is a hard guarantee. 这是因为在存在哈希冲突的情况下哈希集的性能会降低到O(n)(攻击者可以强制使用,使用的哈希函数是确定性的并且事先是固定的),而有序集'O(log n)是很难保证。

If you only need to sort your input collection once and then traverse it in whole, or various prefixes/suffixes of it, repeatedly, then building up a sorted vector of unique items may indeed be the better option. 如果您只需要对输入集合进行一次排序,然后重复遍历整个或各种前缀/后缀,那么构建独特项目的排序向量可能确实是更好的选择。 A sorted set may still be preferable even for a traversal-only workload, though, if you need the subseq / rsubseq feature of starting at an arbitrary element of the collection ( (subseq a-set >= 5) = seq over those elements of a-set which are >= 5 with respect to a-set 's ordering). 如果你需要从subseq的任意元素开始的subseq / rsubseq特性( (subseq a-set >= 5) = seq,那么排序集可能仍然是优选的,即使对于仅遍历工作负载也是subseqa-set相对于a-set的排序> = 5)。

Personally, I use sorted sets if I want an ordered data structure without duplicates as I add elements in. That being said, I start with an empty set rather than apply it to a list. 就个人而言,如果我想要一个没有重复的有序数据结构,我会使用有序集合,因为我添加元素。也就是说,我从一个空集开始,而不是将它应用于列表。

The time I would use sort and distinct is if I had any other data structure like a list that I want to order and remove duplicates. 我将使用sort和distinct的时间是,如果我有任何其他数据结构,如我想要订购的列表和删除重复项。

Basically, applying a set gives you a new object with unique elements while distinct acts on the same list reference. 基本上,应用集合会为您提供具有唯一元素的新对象,而不同的元素会对同一列表引用起作用。

The difference between a sorted set and the result of calling sort and distinct is that the resulting type is a set. 有序集与调用sortdistinct的结果之间的distinct在于结果类型是一个集合。

This gives you O(log N) performance (think binary search) to check whether an element is in the collection ( contains? ) or to add one ( conj ), while on a list, returned by sort and distinct you'd get worse characteristics to achieve the same behavior by default. 这为你提供了O(log N)性能(想想二元搜索)来检查一个元素是否在集合中( contains? )或者添加一个( conj ),而在列表中,通过sort返回并且distinct你会变得更糟默认情况下实现相同行为的特征。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM