简体   繁体   English

如何在Clojure中快速删除向量中的元素?

[英]How to remove elements from a vector in a fast way in Clojure?

I'm trying to remove elements from a Clojure vector: 我正在尝试从Clojure向量中删除元素:

Note that I'm using Clojure's operations from Kotlin 请注意,我正在使用Kotlin的Clojure运算

val set = PersistentHashSet.create("foo")
val vec = PersistentVector.create("foo", "bar")
val seq = clojure.`core$remove`.invokeStatic(set, vec) as ISeq
val resultVec = clojure.`core$vec`.invokeStatic(seq) as PersistentVector

This is the equivalent of the following Clojure code: 这等效于以下Clojure代码:

(remove #{"foo"} ["foo" "bar"])

The code works fine but I've noticed that creating a vector from the seq is extrmely slow. 该代码可以正常工作,但是我注意到从seq创建向量非常慢。 I've written a benchmark and these were the results: 我写了一个基准,结果如下:

| Item count | Remove ms | Remove with converting back to vector ms|
-----------------------------------------------------------------
| 1000       | 51        | 1355                                 |
| 10000      | 71        | 5123                                 |

Do you know how I can convert the seq resulting from the remove operation back to a vector without the harsh performance penalty? 您知道如何将remove操作产生的seq转换回vector而不会降低性能吗?

If it is not possible is there an alternative way to perform the remove operation? 如果不可能,是否有其他方法可以执行remove操作?

What you are trying to do fundamentally performs badly. 您尝试做的事情从根本上表现不佳。 Vectors are for fast indexed read/write, and O(1) access to the right end. 向量用于快速索引的读/写,并且O(1)访问右端。 To do anything else you must tear the vector apart and rebuild it again, an O(N) operation. 要执行其他任何操作,您必须将向量撕开并重新构建,这是O(N)操作。 If you need an operation like this to be efficient, you must use a different data structure. 如果需要这样的操作才能高效,则必须使用其他数据结构。

You could try the complementary operation to remove that returns a vector: 您可以尝试执行补充操作以remove返回向量的方法:

(filterv (complement #{"foo"}) 
         ["foo" "bar"])

Note the use of filterv . 注意使用filterv The v indicates that it uses a vector from the start, and returns a vector, so no conversion is required. v表示从头开始使用向量,并返回向量,因此不需要转换。 It uses a transient vector behind the scenes, so it should be pretty fast. 它在后台使用了transient矢量,因此它应该非常快。

I'm negating the predicate using complement so I can use filterv , since there is no removev . 我使用complement否定谓词,因此可以使用filterv ,因为没有removev remove is just defined as the complement of filter anyway though , so it's basically what you were already doing, just strict. 无论如何remove只是被定义为filtercomplement ,因此基本上,这就是您已经在做的事情,只是严格。

Why not a PersistentHashSet? 为什么不使用PersistentHashSet? Fast removal, though not ordered. 快速删除,尽管没有命令。 I do vaguely recall Clojure also having a sorted set in case that's needed. 我确实隐约记得,Clojure在需要的时候也有一个分类的集合。

You have made an error of accepting the lazy result of remove as equivalent to the concrete result of converting back to a vector. 在接受remove的惰性结果等同于转换回向量的具体结果时,您犯了一个错误。 Compare the lazy result of (remove ...) with the concrete result implied by (count (remove ...)) . (remove ...)的惰性结果与(count (remove ...))隐含的具体结果进行比较。 You will see that it is slightly slower than just doing (vec (remove ...)) . 您会发现它比做(vec (remove ...))慢一些。 Also, for real speed-critical applications, there is nothing like using a native Java ArrayList : 同样,对于真正的速度至关重要的应用程序,没有什么比使用本机Java ArrayList

(ns tst.demo.core
  (:require
    [criterium.core :as crit]    )
  (:import [java.util ArrayList]))

(def N 1000)
(def tgt-item (/ N 2))

(def pred-set #{ (long tgt-item) })
(def data-vec (vec (range N)))

(def data-al (ArrayList. data-vec))
(def tgt-items (ArrayList. [tgt-item]))


(println :lazy)
(crit/quick-bench
  (remove pred-set data-vec))

(println :lazy-count)
(crit/quick-bench
  (count (remove pred-set data-vec)))

(println :vec)
(crit/quick-bench
  (vec (remove pred-set data-vec)))

(println :ArrayList)
(crit/quick-bench
  (let [changed? (.removeAll data-al tgt-items)]
    data-al)) 

with results: 结果:

:lazy           Evaluation count : 35819946     time mean :    10.856 ns 
:lazy-count     Evaluation count :     8496     time mean : 69941.171 ns 
:vec            Evaluation count :     9492     time mean : 62965.632 ns 
:ArrayList      Evaluation count :   167490     time mean :  3594.586 ns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM