简体   繁体   English

如何遍历Clojure中的嵌套dict / hash-map自定义拼合/转换我的数据结构?

[英]How do I iterate through a nested dict/hash-map in Clojure to custom-flatten/transform my data structure?

I have something that looks like this: 我有这样的东西:

{:person-123 {:xxx [1 5]
              :zzz [2 3 4]}
 :person-456 {:yyy [6 7]}}

And I want to transform it so it looks like this: 我想对其进行转换,使其看起来像这样:

[{:person "123" :item "xxx"}
 {:person "123" :item "zzz"}
 {:person "456" :item "yyy"}]

This is a flatten -like problem, and I know I can convert the keywords into strings by calling name on them, but I couldn't come across a convenient way to do this. 这是一个类似flatten的问题,我知道可以通过在关键字上调用name来将关键字转换为字符串,但是我无法找到一种方便的方法来实现。

This is how I did it, but it seems inelegant (nested for loops, I'm looking at you): 这是我做到了,但似乎不雅(嵌套for循环,我看着你):

(require '[clojure.string :refer [split]])
(into [] 
      (apply concat
             (for [[person nested-data] input-data]
                  (for [[item _] nested-data]
                       {:person (last (split (name person) #"person-"))
                        :item (name item)}))))

Your solution is not too bad, as for the nested for loops, well for actually supports nested loops, so you could write it as: 您的解决方案是不是太糟糕,对于嵌套的for循环,以及for实际支持嵌套循环,所以你可以把它写成:

(vec 
  (for [[person nested-data] input-data
       [item _] nested-data]
    {:person (last (clojure.string/split (name person) #"person-"))
     :item   (name item)}))

personally, I tend to use for exclusively for that purpose (nested loops), otherwise I am usually more comfortable with map et al. 就个人而言,我倾向于使用for专门用于这一目的(嵌套循环),否则我通常更舒适的map等。 But thats just a personal preference. 但这只是个人喜好。

I also very much agree with @amalloy's comment on the question, I would put some effort into having a better looking map structure to begin with. 我也非常同意@amalloy对这个问题的评论,因此我将付出一些努力来使地图结构看起来更好。

(let [x {:person-123 {:xxx [1 5]
                              :zzz [2 3 4]}
                 :person-456 {:yyy [6 7]}}]

            (clojure.pprint/pprint
                (mapcat
                    (fn [[k v]]
                        (map (fn [[k1 v1]]
                                 {:person (clojure.string/replace (name k) #"person-" "") :item (name k1)}) v))
                    x))
            )

I am not sure if there is a single high-order function, at least in the core, that does what you want in one go. 我不确定至少在核心中是否有单个高阶函数可以一次性完成您想要的。

On the other hand, similar methods exist in GNU R reshape library, which, by the way, has been recreated for clojure: https://crossclj.info/ns/grafter/0.8.6/grafter.tabular.melt.html#_melt-column-groups which might interest you. 另一方面,GNU R重塑库中存在类似的方法,顺便说一下,它已经为clojure重新创建: https : //crossclj.info/ns/grafter/0.8.6/grafter.tabular.melt.html# _melt-column-groups可能会让您感兴趣。

This is how it works in Gnu R: http://www.statmethods.net/management/reshape.html 这就是它在Gnu R中的工作方式: http : //www.statmethods.net/management/reshape.html

Lots of good solutions so far. 到目前为止,有很多好的解决方案。 All I would add is a simplification with keys : 我要添加的只是keys的简化:

(vec
     (for [[person nested-data] input-data
           item (map name (keys nested-data))]
       {:person (clojure.string/replace-first
                  (name person)
                  #"person-" "")
        :item   item}))

Note btw the near universal preference for replace over last/split. 注意顺便说一句,是普遍优先选择last / split。 Guessing the spirit of the transformation is "lose the leading person- prefix", replace says that better. 猜测转变的精神是“失去领导者前缀”,而replace则更好。 If OTOH the spirit is "find the number and use that", a bit of regex to isolate the digits would be truer. 如果OTOH的精神是“找到数字并使用该数字”,那么使用正则表达式隔离数字将更为真实。

(reduce-kv (fn [ret k v]
             (into ret (map (fn [v-k]
                              {:person (last (str/split (name k) #"-"))
                               :item   (name v-k)}) 
                            (keys v))))
           []
           {:person-123 {:xxx [1 5] :zzz [2 3 4]}
            :person-456 {:yyy [6 7]}})

=> [{:person "123", :item "xxx"} 
    {:person "123", :item "zzz"} 
    {:person "456", :item "yyy"}]

Here are three solutions. 这是三个解决方案。

The first solution uses Python-style lazy generator functions via lazy-gen and yield functions from the Tupelo library . 第一个解决方案通过lazy-genTupelo库中的 yield函数使用Python风格的惰性生成器函数。 I think this method is the simplest since the inner loop produces maps and the outer loop produces a sequence. 我认为这种方法是最简单的,因为内部循环生成映射,而外部循环生成序列。 Also, the inner loop can run zero, one, or multiple times for each outer loop. 同样,对于每个外部循环,内部循环可以运行零次,一次或多次。 With yield you don't need to think about that part. 有了yield您无需考虑那部分。

(ns tst.clj.core
  (:use clj.core clojure.test tupelo.test)
  (:require
    [clojure.string :as str]
    [clojure.walk :as walk]
    [clojure.pprint :refer [pprint]]
    [tupelo.core :as t]
    [tupelo.string :as ts]
  ))
(t/refer-tupelo)

(def data
  {:person-123 {:xxx [1 5]
                :zzz [2 3 4]}
   :person-456 {:yyy [6 7]}})

(defn reformat-gen [data]
  (t/lazy-gen
    (doseq [[outer-key outer-val] data]
      (let [int-str (str/replace (name outer-key) "person-" "")]
        (doseq [[inner-key inner-val] outer-val]
          (let [inner-key-str (name inner-key)]
            (t/yield {:person int-str :item inner-key-str})))))))

If you really want to be "pure", the following is another solution. 如果您真的想“纯净”,以下是另一种解决方案。 However, with this solution I made a couple of errors and required many, many debug printouts to fix. 但是,使用此解决方案,我犯了两个错误,并且需要修复许多调试打印输出。 This version uses tupelo.core/glue instead of concat since it is "safer" and verifies that the collections are all maps, all vectors/list, etc. 此版本使用tupelo.core/glue而不是concat因为它“更安全”,并验证了集合是否全部是地图,所有矢量/列表等。

(defn reformat-glue [data]
  (apply t/glue
    (forv [[outer-key outer-val] data]
      (let [int-str (str/replace (name outer-key) "person-" "")]
        (forv [[inner-key inner-val] outer-val]
          (let [inner-key-str (name inner-key)]
            {:person int-str :item inner-key-str}))))))

Both methods give the same answer: 两种方法给出相同的答案:

(newline) (println "reformat-gen:")
(pprint (reformat-gen data))
(newline) (println "reformat-glue:")
(pprint (reformat-glue data))

reformat-gen:
({:person "123", :item "xxx"}
 {:person "123", :item "zzz"}
 {:person "456", :item "yyy"})

reformat-glue:
[{:person "123", :item "xxx"}
 {:person "123", :item "zzz"}
 {:person "456", :item "yyy"}]

If you wanted to be "super-pure", here is a third solution (although I think this one is trying too hard!). 如果您想成为“超纯”的人,这是第三个解决方案(尽管我认为这太难了!)。 Here we use the ability of the for macro to have nested elements in a single expression. 在这里,我们使用for宏在单个表达式中具有嵌套元素的功能。 for can also embed let expressions inside itself, although here that leads to duplicate evaluation of int-str . for也可以在内部嵌入let表达式,尽管这会导致对int-str重复求值。

(defn reformat-overboard [data]
  (for [[outer-key outer-val] data
        [inner-key inner-val] outer-val
        :let [int-str       (str/replace (name outer-key) "person-" "") ; duplicate evaluation
              inner-key-str (name inner-key)]]
    {:person int-str :item inner-key-str}))
(newline)
(println "reformat-overboard:")
(pprint (reformat-overboard data))

reformat-overboard:
({:person "123", :item "xxx"}
 {:person "123", :item "zzz"}
 {:person "456", :item "yyy"})

I would probably stick with the first one since it is (at least to me) much simpler and more bulletproof. 我可能会坚持使用第一个,因为(至少对我而言)它更简单,更防弹。 YMMV. YMMV。


Update: 更新:

Notice that the 3rd method yields a single sequence of maps, even though there are 2 nested for iterations happening. 请注意,即使有2个嵌套嵌套for迭代,第3种方法也会产生一个单一的映射序列。 This is different than having two nested for expressions, which would yield a sequence of a sequence of maps. 这与for表达式嵌套两个嵌套表达式不同,后者将产生一系列映射序列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM