简体   繁体   English

lapply 或 sapply 用于列表中的 data.frames

[英]lapply or sapply for data.frames in List

I play around with Alternatives for the dplyr way of summarising data.我使用 Alternatives 来使用 dplyr 汇总数据的方式。 I like the split and apply approach but need some help.我喜欢拆分和应用方法,但需要一些帮助。

library(Hmisc)
library(data.table)

summary <- function(x) {
    funs <- c(wtd.mean, wtd.var)
    sapply(funs, function(f) f(x, na.rm = TRUE))
}


df <- split(mtcars, f = mtcars$cyl)

store <- list()

for(i in 1:length(df)) {
    store[[i]] <- data.frame(sapply(df[[i]], summary)) 
}

finaldf <- data.table::rbindlist(store)

finaldf

Here is my code.这是我的代码。 With the split function i get three dataframes with summarised values.使用 split 函数,我得到三个带有汇总值的数据帧。 But after that my code gets a little bit messy with creating an empty list, converting the matrix to a data.frame inside the loop etc.但在那之后,我的代码在创建一个空列表、将矩阵转换为循环内的 data.frame 等方面变得有点混乱。

Is there a way to use multiple apply functions to avoid this loop?有没有办法使用多个应用函数来避免这个循环? Something like lapply(sapply(...)) ?像 lapply(sapply(...)) 之类的东西?

We can use lapply and avoid the initialization of list我们可以使用lapply并避免list的初始化

library(data.table)
lst <- lapply(df,  function(dat) data.frame(lapply(dat, summary)))
rbindlist(lst, idcol = 'grp')
#   grp       mpg cyl      disp         hp      drat        wt      qsec         vs        am      gear      carb
#1:   4 26.663636   4  105.1364   82.63636 4.0709091 2.2857273 19.137273 0.90909091 0.7272727 4.0909091 1.5454545
#2:   4 20.338545   0  722.0825  438.25455 0.1335691 0.3244028  2.830622 0.09090909 0.2181818 0.2909091 0.2727273
#3:   6 19.742857   6  183.3143  122.28571 3.5857143 3.1171429 17.977143 0.57142857 0.4285714 3.8571429 3.4285714
#4:   6  2.112857   0 1727.4381  588.57143 0.2266286 0.1269821  2.913390 0.28571429 0.2857143 0.4761905 3.2857143
#5:   8 15.100000   8  353.1000  209.21429 3.2292857 3.9992143 16.772143 0.00000000 0.1428571 3.2857143 3.5000000
#6:   8  6.553846   0 4592.9523 2598.64286 0.1386533 0.5766956  1.430449 0.00000000 0.1318681 0.5274725 2.4230769

The steps can be much simplified as well if we use data.table group by methods如果我们使用data.table group by 方法,步骤也可以大大简化

as.data.table(mtcars)[, lapply(.SD, summary), by = cyl]

Or instead of sapply ing the functions, apply it individually and concatenate the output或者不是sapply函数,而是单独应用它并连接输出

summary1 <- function(x)  c(wtd.mean(x, na.rm = TRUE), wtd.var(x, na.rm = TRUE))
as.data.table(mtcars)[, lapply(.SD, summary1), by = cyl]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM