[英]lapply or sapply for data.frames in List
I play around with Alternatives for the dplyr way of summarising data.我使用 Alternatives 来使用 dplyr 汇总数据的方式。 I like the split and apply approach but need some help.我喜欢拆分和应用方法,但需要一些帮助。
library(Hmisc)
library(data.table)
summary <- function(x) {
funs <- c(wtd.mean, wtd.var)
sapply(funs, function(f) f(x, na.rm = TRUE))
}
df <- split(mtcars, f = mtcars$cyl)
store <- list()
for(i in 1:length(df)) {
store[[i]] <- data.frame(sapply(df[[i]], summary))
}
finaldf <- data.table::rbindlist(store)
finaldf
Here is my code.这是我的代码。 With the split function i get three dataframes with summarised values.使用 split 函数,我得到三个带有汇总值的数据帧。 But after that my code gets a little bit messy with creating an empty list, converting the matrix to a data.frame inside the loop etc.但在那之后,我的代码在创建一个空列表、将矩阵转换为循环内的 data.frame 等方面变得有点混乱。
Is there a way to use multiple apply functions to avoid this loop?有没有办法使用多个应用函数来避免这个循环? Something like lapply(sapply(...)) ?像 lapply(sapply(...)) 之类的东西?
We can use lapply
and avoid the initialization of list
我们可以使用lapply
并避免list
的初始化
library(data.table)
lst <- lapply(df, function(dat) data.frame(lapply(dat, summary)))
rbindlist(lst, idcol = 'grp')
# grp mpg cyl disp hp drat wt qsec vs am gear carb
#1: 4 26.663636 4 105.1364 82.63636 4.0709091 2.2857273 19.137273 0.90909091 0.7272727 4.0909091 1.5454545
#2: 4 20.338545 0 722.0825 438.25455 0.1335691 0.3244028 2.830622 0.09090909 0.2181818 0.2909091 0.2727273
#3: 6 19.742857 6 183.3143 122.28571 3.5857143 3.1171429 17.977143 0.57142857 0.4285714 3.8571429 3.4285714
#4: 6 2.112857 0 1727.4381 588.57143 0.2266286 0.1269821 2.913390 0.28571429 0.2857143 0.4761905 3.2857143
#5: 8 15.100000 8 353.1000 209.21429 3.2292857 3.9992143 16.772143 0.00000000 0.1428571 3.2857143 3.5000000
#6: 8 6.553846 0 4592.9523 2598.64286 0.1386533 0.5766956 1.430449 0.00000000 0.1318681 0.5274725 2.4230769
The steps can be much simplified as well if we use data.table
group by methods如果我们使用data.table
group by 方法,步骤也可以大大简化
as.data.table(mtcars)[, lapply(.SD, summary), by = cyl]
Or instead of sapply
ing the functions, apply it individually and concatenate the output或者不是sapply
函数,而是单独应用它并连接输出
summary1 <- function(x) c(wtd.mean(x, na.rm = TRUE), wtd.var(x, na.rm = TRUE))
as.data.table(mtcars)[, lapply(.SD, summary1), by = cyl]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.