简体   繁体   English

数据框行按组嵌套列表元素

[英]Data frame rows to nested list elements by groups

I have a data frame like this 我有这样的数据框

  id key value
1  x   a     1
2  x   b     2
3  y   a     3
4  y   b     4

read.table(text = "id   key value
x   a   1
x   b   2
y   a   3
y   b   4", header = TRUE, sep = "\t")

And I would like to get a list for each id with sub lists for each key 我想得到每个id列表,每个key都有子列表

So with my example the expected output would be : 所以在我的例子中,预期的输出将是:

$x
$x$a
$x$a$value
[1] 1

$x$b
$x$b$value
[1] 2

$y
$y$a
$y$a$value
[1] 3

$y$b
$y$b$value
[1] 4

list(
  x = list(
    a = list(value = 1), 
    b = list(value = 2)
  ), 
  y = list(
    a = list(value = 3), 
    b = list(value = 4)
  )
)

I can achieve it with nested lapply and split but I think there should be a more straightforward way to do it. 我可以通过嵌套lapplysplit实现它,但我认为应该有一种更简单的方法来实现它。

Any help would be appreciated. 任何帮助,将不胜感激。

Overview 概观

Two methods - one using base and the other using plyr - to split your data frame by a group, apply a function over each group, and return the results in a list. 两个方法 - 一个使用base ,另一个使用plyr - 按组拆分数据框,在每个组上应用一个函数,并在列表中返回结果。

Use base::split.data.frame() followed by an lapply() to extract the value element for each unique id - key pair. 使用base::split.data.frame()后跟lapply()来提取每个唯一id - key对的value元素。

# split data frame
# based on 'id' and 'key' pairs
df.split <-
    split.data.frame(
        x = df
        , f = list( df$id, df$key )
    )
# keep only the value
# element within each list
df.split <-
    lapply(
        X = df.split
        , FUN = function( i )
            i[["value"]]
    )

# view results
df.split
# $x.a
# [1] 1
# 
# $y.a
# [1] 3
# 
# $x.b
# [1] 2
# 
# $y.b
# [1] 4

# end of script #

Use plyr::dlply() to do the same thing, without the need for lapply() . 使用plyr::dlply()来做同样的事情,而不需要lapply()

# load necessary packages
library( plyr )

# splits df by the 'id' and 'key' variables
# and return the 'value' for each pairing
df.split <-
    dlply( 
        .data = df
        , .variables = c( "id", "key" )
        , .fun = function(i) i[["value"]]
    )

# view results
df.split
# $x.a
# [1] 1
# 
# $x.b
# [1] 2
# 
# $y.a
# [1] 3
# 
# $y.b
# [1] 4
# 
# attr(,"split_type")
# [1] "data.frame"
# attr(,"split_labels")
# id key
# 1  x   a
# 2  x   b
# 3  y   a
# 4  y   b

# end of script #

@Colonel Beauvel's answer to the SO post Emulate split() with dplyr group_by: return a list of data frames was helpful in answering this question. @Colonel Beauvel回答SO帖子Emulate split()与dplyr group_by:返回数据框列表有助于回答这个问题。

One solution with limited number of split and nested *apply : 具有有限数量的split和嵌套*apply一种解决方案*apply

lapply(split(df, df$id), function(x) setNames(apply(x, 1L, function(x) as.list(x["value"])), x[["key"]]))

Nested lapply and split alternative : 嵌套lapplysplit替代:

lapply(split(df, df$id), function(x) lapply(split(x["value"], x$key), as.list))

Improvments are welcome ! 欢迎改进!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM