[英]subset dataframe by column in a list based on a vector of column names and summarize the columns
I have a list of 40 dataframes. 我有一个40个数据框的列表。 A subset for example would look like:
例如,一个子集看起来像:
d1<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"), d=c(6,7,8,9,10), e=c(11,12,13,14,15))
d2<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"), d=c(6,7,8,9,10), e=c(11,12,13,14,15))
d3<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"), d=c(6,7,8,9,10), e=c(11,12,13,14,15))
mylist <- list(l1=d1, l2=d2, l3=d3)
I want to subset the database based on a vector of column names: 我想基于列名称的向量对数据库进行子集化:
subset_colnames <- c("a", "d", "e")
Such that after subsetting dataframe should look like this: 这样子集数据框后应如下所示:
#Subsetting dataframes based on columns:
d1<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"))
d2<-data.frame(d=c(6,7,8,9,10), b=c("2006", "2006", "2006", "2007", "2007"))
d3<-data.frame(e=c(11,12,13,14,15), b=c("2006", "2006", "2006", "2007", "2007"))
mylist_filtered = list(l1=d1, l2=d2, l3=d3)
Eventually I want to summarize the column names in subset_columns
for each dataframe in the list like so: 最终,我想总结一下列表中每个数据
subset_columns
的列名,如下所示:
d1 %>%
group_by(b) %>%
summarise(mean = mean(a), n = n())
d2 %>%
group_by(b) %>%
summarise(mean = mean(d), n = n())
d3 %>%
group_by(b) %>%
summarise(mean = mean(e), n = n())
I would like to do this using lapply
, looked at solutions here and here but my operation is slightly unique in that I want to subset columns based on a character vector 我想使用
lapply
做到这lapply
,在这里和这里查看解决方案,但是我的操作有点独特,因为我想基于字符向量对列进行子集化
You can use Map
, with a customized function that takes a data frame from the list and a column name from the subset_columns and summarize it; 您可以使用
Map
和具有自定义功能的自定义函数,该函数从列表中获取数据框,并从subset_columns中获取列名并进行汇总; To evaluate the character name as a actual column in summarize
, use the rlang/tidyeval
syntax: 为了评估该字符的名称作为一个实际的列
summarize
,使用rlang/tidyeval
语法:
library(dplyr); library(rlang);
cust_mean <- function(df, col) {
df %>%
group_by(b) %>%
summarise(mean = mean(!!sym(col)), n = n())
}
Map(cust_mean, mylist, subset_colnames)
#$l1
# A tibble: 2 x 3
# b mean n
# <fctr> <dbl> <int>
#1 2006 2.0 3
#2 2007 4.5 2
#$l2
# A tibble: 2 x 3
# b mean n
# <fctr> <dbl> <int>
#1 2006 7.0 3
#2 2007 9.5 2
#$l3
# A tibble: 2 x 3
# b mean n
# <fctr> <dbl> <int>
#1 2006 12.0 3
#2 2007 14.5 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.