简体   繁体   English

如何拆分data.frame列表并将函数应用于一列?

[英]How to split a list of data.frame and apply a function to one column?

I have a small question about apply functions. 关于应用函数,我有一个小问题。 For example I have: 例如,我有:

l <- list(a = data.frame(A1=rep(10,5),B1=c(1,1,1,2,2),C1=c(5,10,20,7,30)),
          b = data.frame(A1=rep(20,5),B1=c(3,3,4,4,4),C1=c(3,5,10,20,30)))

I want to find a minimum C1 for each B1. 我想为每个B1找到最小C1。 The result should be 结果应该是

$a
  A1 B1 C1
  10  1  5
  10  2  7

$b
  A1 B1 C1
  20  3  3
  20  4  10

I know how to do it with 'for', but it have to be a easier way with 'lapply', but I couldn't make it works. 我知道如何使用'for'来实现它,但它必须是'lapply'更简单的方法,但我无法使它工作。

Please help 请帮忙

What about combining lapply and tapply : 如何结合lapplytapply

lapply(l, function(i) tapply(i$C1, i$B1, min))
$a
1 2 
5 7 

$b
3  4 
3 10 

The trick to thinking about multiple operations is to split the task into bits. 思考多个操作的诀窍是将任务拆分为多个位。 SO, 所以,

  1. Minimum C1 for each B1 . 每个B1最小C1 How do we do this for a single data frame? 我们如何为单个数据框架执行此操作?

     i = l[[1]] tapply(i$C1, i$B1, min) 
  2. Each element of a list? 列表中的每个元素? Just use lapply : 只需使用lapply

     lapply(l, function(i) tapply(i$C1, i$B1, min)) 

If you can't do step 1, you won't be able to manage step 2. 如果您无法执行第1步,则无法管理第2步。

Having recently succumbed to the siren song of the data.table package and its combination of versatility and speed for doing operations like this, I submit yet another solution: 最近已经屈服于data.table包的警笛歌曲以及它的多功能性和速度的组合,我做了另一个解决方案:

library(data.table)
lapply(l, function(dat) {
    data.table(dat, key="B1,C1")[list(unique(B1)), mult="first"]
})

If retaining the original column order is important, for some reason, the data.table() call could be wrapped by setcolorder(..., names(dat)) . 如果保留原始列顺序很重要,由于某种原因, data.table()调用可以由setcolorder(..., names(dat))包装。

Here's another approach that matches your desired output: 这是另一种匹配您所需输出的方法:

lapply(l, function(x) {
  temp <- ave(x[["C1"]], x["B1"], FUN = min)
  x[x[["C1"]] == temp, ]
})
# $a
#   A1 B1 C1
# 1 10  1  5
# 4 10  2  7
# 
# $b
#   A1 B1 C1
# 1 20  3  3
# 3 20  4 10

You can also try llply + dcast from the plyr/reshape2 toolbox: 您也可以从plyr / reshape2工具箱中尝试llply + dcast:

library(reshape2)
library(plyr)

    l <- list(a = data.frame(A1=rep(10,5),B1=c(1,1,1,2,2),C1=c(5,10,20,7,30)),
              b = data.frame(A1=rep(20,5),B1=c(3,3,4,4,4),C1=c(3,5,10,20,30)))

    llply(l, function (x) {dcast (x, A1+B1~., value.var="C1", min)})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM