根据列名的向量在列表中按列逐列设置子数据集并汇总列

Question

I have a list of 40 dataframes. 我有一个40个数据框的列表。 A subset for example would look like: 例如，一个子集看起来像：

d1<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"), d=c(6,7,8,9,10), e=c(11,12,13,14,15))
d2<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"), d=c(6,7,8,9,10), e=c(11,12,13,14,15))
d3<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"), d=c(6,7,8,9,10), e=c(11,12,13,14,15))

mylist <- list(l1=d1, l2=d2, l3=d3)

I want to subset the database based on a vector of column names: 我想基于列名称的向量对数据库进行子集化：

subset_colnames <- c("a", "d", "e")

Such that after subsetting dataframe should look like this: 这样子集数据框后应如下所示：

#Subsetting dataframes based on columns:
d1<-data.frame(a=c(1,2,3,4,5), b=c("2006", "2006", "2006", "2007", "2007"))
d2<-data.frame(d=c(6,7,8,9,10), b=c("2006", "2006", "2006", "2007", "2007"))
d3<-data.frame(e=c(11,12,13,14,15), b=c("2006", "2006", "2006", "2007", "2007"))

mylist_filtered = list(l1=d1, l2=d2, l3=d3)

Eventually I want to summarize the column names in subset_columns for each dataframe in the list like so: 最终，我想总结一下列表中每个数据subset_columns的列名，如下所示：

d1 %>% 
  group_by(b) %>% 
  summarise(mean = mean(a), n = n())

d2 %>% 
  group_by(b) %>% 
  summarise(mean = mean(d), n = n())

d3 %>% 
  group_by(b) %>% 
  summarise(mean = mean(e), n = n())

I would like to do this using lapply , looked at solutions here and here but my operation is slightly unique in that I want to subset columns based on a character vector 我想使用lapply做到这lapply ，在这里和这里查看解决方案，但是我的操作有点独特，因为我想基于字符向量对列进行子集化

Answer 1

You can use Map , with a customized function that takes a data frame from the list and a column name from the subset_columns and summarize it; 您可以使用Map和具有自定义功能的自定义函数，该函数从列表中获取数据框，并从subset_columns中获取列名并进行汇总； To evaluate the character name as a actual column in summarize , use the rlang/tidyeval syntax: 为了评估该字符的名称作为一个实际的列summarize ，使用rlang/tidyeval语法：

library(dplyr); library(rlang);

cust_mean <- function(df, col) {
    df %>% 
        group_by(b) %>% 
        summarise(mean = mean(!!sym(col)), n = n())
}

Map(cust_mean, mylist, subset_colnames)
#$l1
# A tibble: 2 x 3
#       b  mean     n
#  <fctr> <dbl> <int>
#1   2006   2.0     3
#2   2007   4.5     2

#$l2
# A tibble: 2 x 3
#       b  mean     n
#  <fctr> <dbl> <int>
#1   2006   7.0     3
#2   2007   9.5     2

#$l3
# A tibble: 2 x 3
#       b  mean     n
#  <fctr> <dbl> <int>
#1   2006  12.0     3
#2   2007  14.5     2

根据列名的向量在列表中按列逐列设置子数据集并汇总列

问题描述

1 个解决方案

解决方案1
3 2017-11-10 00:25:51

根据列名的向量在列表中按列逐列设置子数据集并汇总列

问题描述

1 个解决方案

解决方案1 3 2017-11-10 00:25:51

解决方案1
3 2017-11-10 00:25:51