简体   繁体   中英

map part of the vector efficiently in clojure

I wonder how this can be done in Clojure idiomatically and efficiently:

1) Given a vector containing n integers in it: [A 0 A 1 A 2 A 3 ... A n ]

2) Increase the last x items by 1 (let's say x is 100) so the vector will become: [A 0 A 1 A 2 A 3 ... (A n-99 + 1) (A n-98 + 1)... (A n-1 + 1) (A n + 1)]

One naive implementation looks like:

(defn inc-last [x nums]
  (let [n (count nums)]
      (map #(if (>= % (- n x)) (inc %2) %2)
           (range n) 
           nums)))

(inc-last 2 [1 2 3 4]) 
;=> [1 2 4 5]

In this implementation, basically you just map the entire vector to another vector by examine each item to see if it needs to be increased.

However, this is an O(n) operation while I only want to change the last x items in the vector. Ideally, this should be done in O(x) instead of O(n).

I am considering using some functions like split-at/concat to implement it like below:

(defn inc-last [x nums]
  (let [[nums1 nums2] (split-at x nums)]
    (concat nums1 (map inc nums2))))

However, I am not sure if this implementation is O(n) or O(x). I am new to Clojure and not really sure what the time complexity will be for operations like concat/split-at on persistent data structures in Clojure.

So my questions are:

1) What the time complexity here in second implementation?

2) If it is still O(n), is there any idiomatic and efficient implementation that takes only O(x) in Clojure for solving this problem?

Any comment is appreciated. Thanks.

Update:

noisesmith's answer told me that split-at will convert the vector into a list, which was a fact I did not realised previously. Since I will do random access for the result (call nth after processing the vector), I would like to have an efficient solution (O(x) time) while keeping the vector instead of list otherwise nth will slow down my program as well.

Concat and split-at both turn the input into a seq, effectively a linked-list representation, O(x) time. Here is how to do it with a vector for O(n) performance.

user> (defn inc-last-n
        [n x]
        (let [count (count x)
              update (fn [x i] (update-in x [i] inc))]
          (reduce update x (range (- count n) count))))
#'user/inc-last-n
user> (inc-last-n 3 [0 1 2 3 4 5 6])
[0 1 2 3 5 6 7]

This will fail on input that is not associative (like seq / lazy-seq) because there is no O(1) access time in non-associative types.

inc-last is an implementation using a transient , which allows to get a modifiable "in place" vector in constant time and return a persistent! vector also in constant time, which allows to make the updates in O(x). The original implementation used an imperative doseq loop but, as mentioned in the comments, transient operations can return a new object, so it's better to keep doing things in a functional way.

I added a doall to the call to inc-last-2 since it returns a lazy seq, but inc-last and inc-last-3 returns a vector so the doall is needed to be able to compare them all.

According to some quick tests I made, inc-last and inc-last-3 don't actually differ much in performance, not even for huge vectors (10000000 elements). For the inc-last-2 implementation though, there's quite a difference even for a vector of 1000 elements, modifying only the last 10, it's ~100x slower. For smaller vectors or when the n is close to (count nums) the difference is not really that much.

(Thanks to Michał Marczyk for his useful comments)

(def x (vec (range 1000)))

(defn inc-last [n x]
  (let [x (transient x)
        l (count x)]
    (->>
      (range (- l n) l)
      (reduce #(assoc! %1 %2 (inc (%1 %2))) x)
      persistent!)))

(defn inc-last-2 [x nums]
  (let [n (count nums)]
    (map #(if (>= % (- n x)) (inc %2) %2)
         (range n) 
         nums)))

(defn inc-last-3 [n x]
  (let [l (count x)]
    (reduce #(assoc %1 %2 (inc (%1 %2))) x (range (- l n) l))))

(time
  (dotimes [i 100]
    (inc-last 50 x)))

(time
  (dotimes [i 100]
    (doall (inc-last-2 10 x))))

(time
  (dotimes [i 100]
    (inc-last-3 50 x)))

;=> "Elapsed time: 49.7965 msecs"
;=> "Elapsed time: 1751.964501 msecs"
;=> "Elapsed time: 67.651 msecs"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM