[英]How do I iterate through a nested dict/hash-map in Clojure to custom-flatten/transform my data structure?
I have something that looks like this: 我有这样的东西:
{:person-123 {:xxx [1 5]
:zzz [2 3 4]}
:person-456 {:yyy [6 7]}}
And I want to transform it so it looks like this: 我想对其进行转换,使其看起来像这样:
[{:person "123" :item "xxx"}
{:person "123" :item "zzz"}
{:person "456" :item "yyy"}]
This is a flatten
-like problem, and I know I can convert the keywords into strings by calling name
on them, but I couldn't come across a convenient way to do this. 这是一个类似
flatten
的问题,我知道可以通过在关键字上调用name
来将关键字转换为字符串,但是我无法找到一种方便的方法来实现。
This is how I did it, but it seems inelegant (nested for
loops, I'm looking at you): 这是我做到了,但似乎不雅(嵌套
for
循环,我看着你):
(require '[clojure.string :refer [split]])
(into []
(apply concat
(for [[person nested-data] input-data]
(for [[item _] nested-data]
{:person (last (split (name person) #"person-"))
:item (name item)}))))
Your solution is not too bad, as for the nested for loops, well for
actually supports nested loops, so you could write it as: 您的解决方案是不是太糟糕,对于嵌套的for循环,以及
for
实际支持嵌套循环,所以你可以把它写成:
(vec
(for [[person nested-data] input-data
[item _] nested-data]
{:person (last (clojure.string/split (name person) #"person-"))
:item (name item)}))
personally, I tend to use for
exclusively for that purpose (nested loops), otherwise I am usually more comfortable with map
et al. 就个人而言,我倾向于使用
for
专门用于这一目的(嵌套循环),否则我通常更舒适的map
等。 But thats just a personal preference. 但这只是个人喜好。
I also very much agree with @amalloy's comment on the question, I would put some effort into having a better looking map structure to begin with. 我也非常同意@amalloy对这个问题的评论,因此我将付出一些努力来使地图结构看起来更好。
(let [x {:person-123 {:xxx [1 5]
:zzz [2 3 4]}
:person-456 {:yyy [6 7]}}]
(clojure.pprint/pprint
(mapcat
(fn [[k v]]
(map (fn [[k1 v1]]
{:person (clojure.string/replace (name k) #"person-" "") :item (name k1)}) v))
x))
)
I am not sure if there is a single high-order function, at least in the core, that does what you want in one go. 我不确定至少在核心中是否有单个高阶函数可以一次性完成您想要的。
On the other hand, similar methods exist in GNU R reshape library, which, by the way, has been recreated for clojure: https://crossclj.info/ns/grafter/0.8.6/grafter.tabular.melt.html#_melt-column-groups which might interest you. 另一方面,GNU R重塑库中存在类似的方法,顺便说一下,它已经为clojure重新创建: https : //crossclj.info/ns/grafter/0.8.6/grafter.tabular.melt.html# _melt-column-groups可能会让您感兴趣。
This is how it works in Gnu R: http://www.statmethods.net/management/reshape.html 这就是它在Gnu R中的工作方式: http : //www.statmethods.net/management/reshape.html
Lots of good solutions so far. 到目前为止,有很多好的解决方案。 All I would add is a simplification with
keys
: 我要添加的只是
keys
的简化:
(vec
(for [[person nested-data] input-data
item (map name (keys nested-data))]
{:person (clojure.string/replace-first
(name person)
#"person-" "")
:item item}))
Note btw the near universal preference for replace over last/split. 注意顺便说一句,是普遍优先选择last / split。 Guessing the spirit of the transformation is "lose the leading person- prefix",
replace
says that better. 猜测转变的精神是“失去领导者前缀”,而
replace
则更好。 If OTOH the spirit is "find the number and use that", a bit of regex to isolate the digits would be truer. 如果OTOH的精神是“找到数字并使用该数字”,那么使用正则表达式隔离数字将更为真实。
(reduce-kv (fn [ret k v]
(into ret (map (fn [v-k]
{:person (last (str/split (name k) #"-"))
:item (name v-k)})
(keys v))))
[]
{:person-123 {:xxx [1 5] :zzz [2 3 4]}
:person-456 {:yyy [6 7]}})
=> [{:person "123", :item "xxx"}
{:person "123", :item "zzz"}
{:person "456", :item "yyy"}]
Here are three solutions. 这是三个解决方案。
The first solution uses Python-style lazy generator functions via lazy-gen
and yield
functions from the Tupelo library . 第一个解决方案通过
lazy-gen
和Tupelo库中的 yield
函数使用Python风格的惰性生成器函数。 I think this method is the simplest since the inner loop produces maps and the outer loop produces a sequence. 我认为这种方法是最简单的,因为内部循环生成映射,而外部循环生成序列。 Also, the inner loop can run zero, one, or multiple times for each outer loop.
同样,对于每个外部循环,内部循环可以运行零次,一次或多次。 With
yield
you don't need to think about that part. 有了
yield
您无需考虑那部分。
(ns tst.clj.core
(:use clj.core clojure.test tupelo.test)
(:require
[clojure.string :as str]
[clojure.walk :as walk]
[clojure.pprint :refer [pprint]]
[tupelo.core :as t]
[tupelo.string :as ts]
))
(t/refer-tupelo)
(def data
{:person-123 {:xxx [1 5]
:zzz [2 3 4]}
:person-456 {:yyy [6 7]}})
(defn reformat-gen [data]
(t/lazy-gen
(doseq [[outer-key outer-val] data]
(let [int-str (str/replace (name outer-key) "person-" "")]
(doseq [[inner-key inner-val] outer-val]
(let [inner-key-str (name inner-key)]
(t/yield {:person int-str :item inner-key-str})))))))
If you really want to be "pure", the following is another solution. 如果您真的想“纯净”,以下是另一种解决方案。 However, with this solution I made a couple of errors and required many, many debug printouts to fix.
但是,使用此解决方案,我犯了两个错误,并且需要修复许多调试打印输出。 This version uses
tupelo.core/glue
instead of concat
since it is "safer" and verifies that the collections are all maps, all vectors/list, etc. 此版本使用
tupelo.core/glue
而不是concat
因为它“更安全”,并验证了集合是否全部是地图,所有矢量/列表等。
(defn reformat-glue [data]
(apply t/glue
(forv [[outer-key outer-val] data]
(let [int-str (str/replace (name outer-key) "person-" "")]
(forv [[inner-key inner-val] outer-val]
(let [inner-key-str (name inner-key)]
{:person int-str :item inner-key-str}))))))
Both methods give the same answer: 两种方法给出相同的答案:
(newline) (println "reformat-gen:")
(pprint (reformat-gen data))
(newline) (println "reformat-glue:")
(pprint (reformat-glue data))
reformat-gen:
({:person "123", :item "xxx"}
{:person "123", :item "zzz"}
{:person "456", :item "yyy"})
reformat-glue:
[{:person "123", :item "xxx"}
{:person "123", :item "zzz"}
{:person "456", :item "yyy"}]
If you wanted to be "super-pure", here is a third solution (although I think this one is trying too hard!). 如果您想成为“超纯”的人,这是第三个解决方案(尽管我认为这太难了!)。 Here we use the ability of the
for
macro to have nested elements in a single expression. 在这里,我们使用
for
宏在单个表达式中具有嵌套元素的功能。 for
can also embed let
expressions inside itself, although here that leads to duplicate evaluation of int-str
. for
也可以在内部嵌入let
表达式,尽管这会导致对int-str
重复求值。
(defn reformat-overboard [data]
(for [[outer-key outer-val] data
[inner-key inner-val] outer-val
:let [int-str (str/replace (name outer-key) "person-" "") ; duplicate evaluation
inner-key-str (name inner-key)]]
{:person int-str :item inner-key-str}))
(newline)
(println "reformat-overboard:")
(pprint (reformat-overboard data))
reformat-overboard:
({:person "123", :item "xxx"}
{:person "123", :item "zzz"}
{:person "456", :item "yyy"})
I would probably stick with the first one since it is (at least to me) much simpler and more bulletproof. 我可能会坚持使用第一个,因为(至少对我而言)它更简单,更防弹。 YMMV.
YMMV。
Notice that the 3rd method yields a single sequence of maps, even though there are 2 nested for
iterations happening. 请注意,即使有2个嵌套嵌套
for
迭代,第3种方法也会产生一个单一的映射序列。 This is different than having two nested for
expressions, which would yield a sequence of a sequence of maps. 这与
for
表达式嵌套两个嵌套表达式不同,后者将产生一系列映射序列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.