简体   繁体   English

Clojure:链接分组:使用剩余键上的选择键键入

[英]Clojure: Chaining group-by :key with select-keys on remaining keys

I'm trying to understand a simple (as in other languages) workflow with clojure maps. 我正在尝试用clojure地图理解一个简单的(如在其他语言中)工作流程。

It basically comes down to this: How can chain these operations? 它基本上归结为:如何链接这些操作?

  1. group-by :key on a vector of maps group-by :键入地图矢量

  2. select-keys on remaining maps without the previous key select-keys剩余的地图没有以前的关键

  3. group-by again (0..n times) and select-keys 再次group-by (0..n次)和选择键

  4. count unique key instances at the end. count唯一的密钥实例。

See also my previous question: Aggregate and Count in Maps 另请参阅我之前的问题: 地图中的聚合和计数

Example: 例:

Given a vector of maps 给出一张地图矢量

(def DATA [{:a "X", :b "M", :c "K", :d 10}
           {:a "Y", :b "M", :c "K", :d 20}
           {:a "Y", :b "M", :c "F", :d 30}
           {:a "Y", :b "P", :c "G", :d 40}])

performing group-by 进行group-by

(defn get-tree-level-1 [] (group-by :a DATA))

yields a map grouped by the value of that particular key. 生成按该特定键的值分组的映射。

{ X [{:a X, :b M, :c K, :d 10}],
  Y [{:a Y, :b M, :c K, :d 20}
     {:a Y, :b M, :c F, :d 30}
     {:a Y, :b P, :c G, :d 40}]}  

So far, so good. 到现在为止还挺好。 But what if I want to build a tree-like structure out of the data, which means selecting the remaining keys and ignoring some, select :b and :c and ignore :d , which would yield in the next level: 但是,如果我想从数据中构建一个树状结构 ,这意味着选择剩余的键并忽略一些,请选择:b:c并忽略:d ,这将在下一级产生:

(def DATA2   [{ :X [{:b "M", :c "K"}],
                :Y [{:b "M", :c "K"}
                    {:b "M", :c "F"}
                    {:b "P", :c "G"}]}])

And finally, counting all instances of the remaining keys (eg count all unique values of the :b key under the Y -root): 最后,计算剩余键的所有实例(例如,计算Y -root下的:b键的所有唯一值):

(def DATA3   [{ :X [{:M  1}],
                :Y [{:M  2}
                    {:P  1}])

I tried doing a select-keys after the group-by , but the result was empty after the first step: 我尝试在group-by后执行select-keys ,但在第一步后结果为空:

(defn get-proc-sums []
  (into {}
    (map
      (fn [ [k vs] ]
        [k (select-keys vs [:b :c])])
      (group-by :a DATA))))

Repeated application of group-by is the wrong tool: it doesn't compose with itself very well. 重复应用分组是错误的工具:它不能很好地组合自身。 Rather, go over your input maps and transform each of them, once, into a format that's useful to you (using for or map ), and then reduce over that to build your tree structure. 相反,查看输入映射并将其中的每一个转换为对您有用的格式(使用formap ),然后减少它以构建树结构。 Here is a simple implementation: 这是一个简单的实现:

(defn hierarchy [keyseq xs]
  (reduce (fn [m [ks x]]
            (update-in m ks conj x))
          {}
          (for [x xs]
            [(map x keyseq) (apply dissoc x keyseq)])))

user> (hierarchy [:a :b :c] '[{:a "X", :b "M", :c "K", :d 10}
                              {:a "Y", :b "M", :c "K", :d 20}
                              {:a "Y", :b "M", :c "F", :d 30}
                              {:a "Y", :b "P", :c "G", :d 40}])
{"Y" {"P" {"G" ({:d 40})},
      "M" {"F" ({:d 30}),
           "K" ({:d 20})}},
 "X" {"M" {"K" ({:d 10})}}}

This gives you the hierarchical format that you want, with a list of all maps with only the "leftover" keys. 这为您提供了所需的分层格式,其中包含仅包含“剩余”键的所有地图的列表。 From this, you can count them, distinct them, remove the :d key, or whatever else you want, either by writing another function that processes this map, or by adjusting what happens in the reduce function, or the for comprehension, above. 通过这个,你可以计算它们,区分它们,删除:d键,或者你想要的任何其他东西,或者通过编写另一个处理这个地图的函数,或者通过调整reduce函数中的内容或者上面的for comprehension。

the mistake is that you are trying to select keys from values collection, while you should do it for every item in coll, eg using map : 错误是您正在尝试从值集合中选择键,而您应该为coll中的每个项目执行此操作,例如使用map

(defn get-proc-sums []
  (into {}
        (map
         (fn [ [k vs] ]
           [k (map #(select-keys % [:b :c]) vs)])
         (group-by :a DATA))))

user> (get-proc-sums)
{"X" ({:b "M", :c "K"}), 
 "Y" ({:b "M", :c "K"} {:b "M", :c "F"} {:b "P", :c "G"})}

what you're doing is: 你正在做的是:

user> (group-by :a DATA)
{"X" [{:a "X", :b "M", :c "K", :d 10}], 
 "Y" [{:a "Y", :b "M", :c "K", :d 20} 
      {:a "Y", :b "M", :c "F", :d 30} 
      {:a "Y", :b "P", :c "G", :d 40}]}

then you are processing every key-value pair (let's take "Y" pair for that): 然后你正在处理每个键值对(让我们采用“Y”对):

user> (let [[k vals] ["Y" ((group-by :a DATA) "Y")]]
         [k vals])
["Y" [{:a "Y", :b "M", :c "K", :d 20} 
      {:a "Y", :b "M", :c "F", :d 30} 
      {:a "Y", :b "P", :c "G", :d 40}]]

so you do select-keys for a vector of maps: 所以你为地图矢量做了select-keys

user> (select-keys [{:a "Y", :b "M", :c "K", :d 20} 
                    {:a "Y", :b "M", :c "F", :d 30} 
                    {:a "Y", :b "P", :c "G", :d 40}]
                   [:a :b])
{}

which is logical, since you don't have these keys in vector. 这是合乎逻辑的,因为你在向量中没有这些键。

user> (map #(select-keys % [:a :b]) [{:a "Y", :b "M", :c "K", :d 20} 
                                     {:a "Y", :b "M", :c "F", :d 30} 
                                     {:a "Y", :b "P", :c "G", :d 40}])
({:a "Y", :b "M"} {:a "Y", :b "M"} {:a "Y", :b "P"})

update: to fulfill the whole task, I would propose the following: 更新:为了完成整个任务,我建议如下:

(defn process-data [data]
  (->> data
       (group-by :a)
       (map (fn [[k vals]] [k (frequencies (map :b vals))]))
       (into {})))

user> (process-data DATA)
{"X" {"M" 1}, "Y" {"M" 2, "P" 1}}

Here I'll only be addressing the workflow aspect of your question, and one way of thinking through the function design. 在这里,我将只讨论您的问题的工作流程方面,以及思考功能设计的一种方式。 I present only one way out of many, but I think this way is sufficiently idiomatic. 我只提出了许多方法,但我认为这种方式是充分惯用的。 If you're looking for an implementation, amalloy provided a fine one. 如果你正在寻找一个实现, amalloy提供了一个很好的实现。

The problem you pose is a perfect use-case for recursion. 你提出的问题是一个完美的递归用例。 You want to build a nested structure where each level of nesting (except for the last) just follows the same grouping process on the previous grouping result. 您希望构建一个嵌套结构,其中每个嵌套级别(除了最后一个)只是在前一个分组结果上遵循相同的分组过程。 The last level of nesting instead performs a count. 最后一层嵌套改为执行计数。 And you don't know in advance how many levels of nesting there will be. 而且你事先并不知道将会有多少级别的嵌套。

You're throwing away the :c and the :d , so you might as well do that at the start -- it's logically a distinct processing step. 你丢弃了:c:d ,所以你不妨在开始时这样做 - 这在逻辑上是一个独特的处理步骤。

Let's assume you've written your function (call it foo -- I leave its writing as an exercises for the reader). 让我们假设你已经编写了你的​​函数(称之为foo - 我将其写作作为读者的练习)。 It can construct the nested structure in terms of recursive calls to itself. 它可以根据对自身的递归调用来构造嵌套结构。

Let's take your example data set: 我们来看看你的示例数据集:

(def DATA [{:a "X", :b "M", :c "K", :d 10}
           {:a "Y", :b "M", :c "K", :d 20}
           {:a "Y", :b "M", :c "F", :d 30}
           {:a "Y", :b "P", :c "G", :d 40}])

Let's ignore :d , so our pre-processed set looks like: 让我们忽略:d ,所以我们的预处理集看起来像:

(def filtered-data [{:a "X", :b "M", :c "K"}
                    {:a "Y", :b "M", :c "K"}
                    {:a "Y", :b "M", :c "F"}
                    {:a "Y", :b "P", :c "G"}])

Example

And here's an example "query": 这是一个示例“查询”:

(foo filtered-data
     [:a :b :c])

We want it to spit out a nested structure that looks a bit like this: 我们希望它吐出一个看起来有点像这样的嵌套结构:

[{ :X (foo [{:b "M", :c "K"}]
           [:b :c]),
   :Y (foo [{:b "M", :c "K"}
            {:b "M", :c "F"}
            {:b "P", :c "G"}]
           [:b :c]}])

This in turn is equivalent to: 这反过来相当于:

[{ :X [{:M (foo [{:c "K"}]
                [:c])}],
   :Y [{:M (foo [{:c "K"}
                 {:c "F"}]
                [:c]),
        :P (foo [{:c "G"}]
                [:c])}]
]}

These foo s can easily recognize the end of the recursion and switch to a counting behavior: 这些foo可以很容易地识别递归的结束并切换到计数行为:

[{ :X [{:M [{:K 1}]}],
   :Y [{:M [{:F 1}
            {:K 1}],
        :P [{:G 1}]
      }]
]}

Personally, if I were building up such a structure, I'd target one with less "superfluous" nesting, such as this trie : 就个人而言,如果我正在构建这样一个结构,我会针对一个没有“多余”嵌套的对象,比如这个特里

{"X" {"M" {"K" 1}},
 "Y" {"M" {"F" 1, "K" 1},
      "P" {"G" 1}}

But I don't know your use case and whether these are truly superfluous. 但我不知道你的用例以及这些是否真的是多余的。 And if you might want to use this data to produce more than one statistic, then see how amalloy made a condensed structure from which you could derive counts, or anything else. 如果您可能希望使用此数据生成多个统计信息,那么请查看合金化如何构建一个可以从中获取计数的压缩结构 ,或其他任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM