简体   繁体   English

如何运行在同一个集合上运行的多个函数,但只遍历集合一次? (clojure,包括示例)

[英]How do I run multiple functions which operate on the same collection, but only traverse the collection once? (clojure, example included)

Bit of a weird one, but I essentially need to run two independent functions on a vector.有点奇怪,但我基本上需要在一个向量上运行两个独立的函数。 They both map over the vector, and return a result.他们都在向量上 map,并返回一个结果。 If I was to run them one after the other, it would mean going over the collection twice - how would I make it so that I only have to map over the collection once, and can perform both functions?如果我要一个接一个地运行它们,这意味着要检查两次集合 - 我将如何做到这一点,以便我只需要 map 一次,并且可以执行这两个功能? The functions themselves can't be changed as they're used elsewhere independently.功能本身无法更改,因为它们在其他地方独立使用。 I might not be making much sense so an example might be better:我可能没有多大意义,所以举个例子可能会更好:

(defn add-one [xs] (map #(+ % 1) xs))
(defn minus-one [xs] (map #(- % 1) xs))
(def my-xs [1 2 3])
(def result {:added (add-one my-xs) :minused (minus-one my-xs)})

So, I'd essentially like to be able to calculate "result" but only have to go over "xs" once.所以,我基本上希望能够计算“结果”,但只需要 go 超过“xs”一次。 I'm not sure it's at all possible to do this given the functions expect a collection but I thought I'd check in case there was some clojure magic I'm missing:D鉴于功能需要一个集合,我不确定是否有可能做到这一点,但我想我会检查以防我缺少一些 clojure 魔法:D

EDIT - I could just use inc/dec in this example, but the point is that I need to leverage the functions which operate on a collection, as the actual functions are a lot more complex:)编辑 - 在这个例子中我可以只使用 inc/dec,但关键是我需要利用对集合进行操作的函数,因为实际的函数要复杂得多:)

There is no general solution.没有通用的解决方案。 If you have two arbitrary functions which consume a sequence and operate on it in some unknown way to produce a result, you cannot avoid traversing the sequence twice.如果您有两个任意函数消耗一个序列并以某种未知方式对其进行操作以产生结果,则无法避免遍历该序列两次。

For various kinds of constraints on the functions, combinations are possible.对于功能上的各种约束,组合是可能的。 You've already seen in the comments how [(map f xs) (map g xs)] can be replaced with (apply map list (map (juxt fg) xs)) ;您已经在评论中看到[(map f xs) (map g xs)]如何替换为(apply map list (map (juxt fg) xs)) similar kinds of things can be done for consumers with a monoidal structure, like combining min and max, or if they are both just (fn [xs] (reduce fa xs)) .对于具有幺半群结构的消费者,可以做类似的事情,比如结合 min 和 max,或者如果它们都只是(fn [xs] (reduce fa xs))

The idea is to take the collection functions add-one and minus-one and make them applicable to one single element:这个想法是采用集合函数add-oneminus-one ,并使它们适用于单个元素:

(defn singlify [fun] (fn [x] (first (fun [x]))))

;; now you can do:
((singlify #'add-one) 3)
;; => 4
;; so #'add-one became a function applicable for one element only
;; instead to an entire sequence/collection
;; - well actually we just wrap around the argument x a vector and apply
;; the sequence-function on this `[x]` and then take the result out of the
;; result list/vec/seq by a `first` - so this is not the most performant solution.
;; however, we can now use the functions and get the single-element versions
;; of them without having to modify the original functions.
;; And this solution is generalize-able to all collection functions.

Now using the comments' helpful hints, which tell that juxt makes it possible to apply two different functions on a sequence while traversing just once through it, we get现在使用评论的有用提示,它juxt我们可以在一个序列上应用两个不同的函数,同时只遍历它一次,我们得到

(map (juxt (singlify #'add-one) (singlify #'minus-one)) my-xs)
;; => ([2 0] [3 1] [4 2])

Using zipmap and the helpful lispy idiom (apply map #'<collector-function> <input-collection>) to transpose the result list, we can split them into a dict/map with the corresponding keywords upfront:使用zipmap和有用的 lispy 成语(apply map #'<collector-function> <input-collection>)来转置结果列表,我们可以将它们拆分为预先带有相应关键字的 dict/map:

(zipmap [:added :minused] 
        (apply map vector 
               (map (juxt (singlify #'add-one) 
                          (singlify #'minus-one)) 
                    my-xs)))
;; => {:added [2 3 4], :minused [0 1 2]}

generalize as a function概括为 function

We can generalize this as a function which takes a seq of keys, a seq of to-be-applied collection-functions and the to-be-once-only-traversed input seq/collection:我们可以将其概括为 function ,它接受一个键序列、一个待应用集合函数的序列和一个只遍历一次的输入序列/集合:

;; this helper function applies `juxt` 
;; on the `singlify`-ed versions of the collection-functions:
(defn juxtify [funcs] (apply #'juxt (map #(singlify %) funcs)))

;; so the generalized function is:
(defn traverse-once [keys seq-funcs sq]
  (zipmap keys (apply map vector (map (juxtify seq-funcs) sq))))

Using this function, the example case looks like this:使用此 function,示例案例如下所示:

(traverse-once [:added :minused] [#'add-one #'minus-one] my-xs)
;; => {:added [2 3 4], :minused [0 1 2]}

We can now extend is as we want:我们现在可以随心所欲地扩展:

(traverse-once [:squared 
                :minused 
                :added] 
               [(fn [sq] (map #(* % %) sq)) 
                #'minus-one 
                #'add-one] 
               my-xs)
;; => {:squared [1 4 9], :minused [0 1 2], :added [2 3 4]}

Voila!瞧!

The functions themselves can't be changed as they're used elsewhere independently.功能本身无法更改,因为它们在其他地方独立使用。

Refactor so that the operations collecting the values are independent of the sequence consumption, then reimplement the sequence consumption functions atop these refactorings (so the existing API is honoured, as that's a hard constraint), and then compose the independent operations as needed to avoid repeated sequence iteration.重构,使收集值的操作独立于序列消耗,然后在这些重构之上重新实现序列消耗函数(因此尊重现有的 API,因为这是一个硬约束),然后根据需要组合独立的操作以避免重复序列迭代。

In the case that you are consuming an upstream package from outside your organisation, open a dialog about them changing the API to allow for efficient operation in this way.如果您从组织外部使用上游 package,请打开一个关于他们更改 API 以允许以这种方式高效运行的对话框。

Any approaches along the lines of digging around the functions such as singlify are likely to break in subtle ways or confuse future maintainers, even assuming that the cost of boxing sequence items in a vector for reconsumption isn't a performance issue.任何围绕诸如 singlify 之类的函数进行挖掘的方法都可能会以微妙的方式破坏或使未来的维护者感到困惑,即使假设将序列项装箱以供重新使用的成本不是性能问题。

  • Traverse the sequence once, using juxt to calculate all the results you want for each element - call it the base sequence.遍历序列一次,使用juxt计算每个元素所需的所有结果 - 将其称为base序列。
  • Return a map of sequences, each of which selects its own element from the elements of the base sequence.返回序列的 map,每个序列从base序列的元素中选择自己的元素。 This is transposing on demand.这是按需转置。

Thus:因此:

(defn one-pass-maps [fn-map]
  (let [fn-vector (apply juxt (vals fn-map))]
    (fn [coll]
      (let [base (map fn-vector coll)]
        (zipmap (keys fn-map) (map (fn [n] (map #(% n) base)) (range)))))))

For example,例如,

((one-pass-maps {:inc inc, :dec dec}) (range 10))
=> {:inc (1 2 3 4 5 6 7 8 9 10), :dec (-1 0 1 2 3 4 5 6 7 8)}

All the traversals are lazy.所有的遍历都是惰性的。 The base sequence is only realized as far as any transposed sequence travels. base序列仅在任何转置序列行进时才被实现。 But if one sequence is realized up to - say - the fifth element, they all are.但是,如果一个序列被实现到——比如说——第五个元素,它们都是。

A helpful general strategy is to express a function in an algebraic style and use various algebraic laws to optimize it.一个有用的通用策略是以代数风格表达 function 并使用各种代数定律对其进行优化。 In this answer, I will focus on the case where the sequence processing can be expressed through reduce or transduce , something that maybe captures the notion of eagerly traversing a sequence, and use some "tupling laws".在这个答案中,我将重点关注可以通过reducetransduce表达序列处理的情况,这可能捕捉到急切遍历序列的概念,并使用一些“元组法则”。 For brevity and clarity I will omit the handling of early termination (via reduced ), a functionality that isn't too hard to add.为了简洁明了,我将省略提前终止的处理( reduced ),这是一个不太难添加的功能。

First of all, I will use the modified versions of reduce and transduce shown below, which have some more desirable properties for the present case:首先,我将使用如下所示的reducetransduce的修改版本,它们对于当前情况具有一些更理想的属性:

(defn reduce
  ([f coll] (reduce f (f) coll))
  ([f init coll] (clojure.core/reduce f init coll)))

(defn transduce [xf f & args]
  (let [f (xf f)]
    (f (apply reduce f args))))

I will also introduce polymorphic versions of juxt , map and map-indexed , which operate on vectors and maps:我还将介绍juxtmapmap-indexed的多态版本,它们对向量和地图进行操作:

(defprotocol JuxtMap
  (juxt* [fs])
  (map* [coll f])
  (map-indexed* [coll f]))

(extend-protocol JuxtMap

  clojure.lang.IPersistentVector
  (juxt* [fs]
    (fn [& xs]
      (into [] (map #(apply % xs)) fs)))
  (map* [coll f]
    (into [] (map f) coll))
  (map-indexed* [coll f]
    (into [] (map-indexed f) coll))

  clojure.lang.IPersistentMap
  (juxt* [fs]
    (fn [& xs]
      (into {} (map (juxt key (comp #(apply % xs) val))) fs)))
  (map* [coll f]
    (into {} (map (juxt key (comp f val))) coll))
  (map-indexed* [coll f]
    (into {} (map (juxt key (partial apply f))) coll)))

(letfn [(flip [f] #(f %2 %1))]
  (def map* (flip map*))
  (def map-indexed* (flip map-indexed*)))

Now we can create two functions, juxt*-rf and juxt*-xf , which satisfy the following "tupling laws", denoting composition by o and assuming that reduce , transduce and map* are curried in their first parameter and the first two don't receive an init :现在我们可以创建两个函数, juxt*-rfjuxt*-xf ,它们满足以下“元组规则”,用o表示组合并假设reducetransducemap*在它们的第一个参数中被柯里化,而前两个不'没有收到init

  • reduce o juxt*-rf = juxt* o map*(reduce)
  • transduce o juxt*-xf = juxt* o map*(transduce)

Here they are:他们来了:

(def juxt*-rf
  (comp
    juxt*
    (partial map-indexed*
      (fn [i f]
        (fn
          ([] (f))
          ([acc] (f (acc i)))
          ([acc x] (f (acc i) x)))))))

(def juxt*-xf
  (comp
    (partial comp juxt*-rf)
    juxt*))

Finally, let's see juxt*-xf in action:最后,让我们看看juxt*-xf的作用:

(defn average [coll]
  (->> coll
       (transduce
         (juxt*-xf [identity (map (constantly 1))])
         +)
       (apply /)))

(average [1 2 3])
;result: 2

(defn map-many [fs coll]
  (transduce
    (juxt*-xf (map* map fs))
    conj
    coll))

(map-many {:inc inc, :dec dec} [1 2 3])
;result: {:inc [2 3 4], :dec [0 1 2]}

(transduce
  (juxt*-xf {:transp (juxt*-xf (map* map [first second]))
             :concat cat})
  conj
  [[1 11] [2 12] [3 13]])
;result: {:transp [[1 2 3] [11 12 13]], :concat [1 11 2 12 3 13]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM