简体   繁体   English

通过purrr和dplyr按组对小标题列表列的每个元素进行均值

[英]Mean across each element of a tibble list-column by group with purrr and dplyr

I'm trying to get used to using tidyverse . 我试图习惯使用tidyverse I don't know if my data is well suited for using functions like map() . 我不知道我的数据是否非常适合使用诸如map()类的函数。 I like the organization of list-columns so I am wondering how to use a combination of group_by() , summarize() , map() , and other functions to get this to work. 我喜欢列表列的组织,所以我想知道如何使用group_by()group_by() summarize()map()和其他函数的组合来使其工作。 I know how to use these functions with vector-columns but do not know how to approach this in the case of list-columns. 我知道如何将这些函数与向量列一起使用,但是对于列表列,我不知道如何处理。

Sample data: 样本数据:

library(tidyverse)

set.seed(3949)
myList <- replicate(12, sample(1:20, size = 10), simplify = FALSE)

tibble(
  group = rep(c("A", "B"), each = 6),
  data = myList
)

Each vector in the list-column has ten elements which are values for a given trial. 列表列中的每个向量都有十个元素,它们是给定试验的值。 What I would like to do is group the tibble by group and then find the "column" mean and se of the expanded lists. 我想做的是按组对小标题进行group ,然后找到扩展列表的“列”均值和se。 In other words, it's like I'm treating the list columns as a matrix with each row of the tibble bound together. 换句话说,就像我将列表列视为矩阵一样,每行小标题都绑定在一起。 The output will have columns for the group and trials as well so it is in the correct format for ggplot2 . 输出还将具有用于组和试验的列,因此对于ggplot2 ,其格式正确。

        mean        se group trial
1   6.000000 1.6329932     A     1
2  12.666667 2.3333333     A     2
3  12.333333 2.8007935     A     3
4  13.833333 1.8150605     A     4
5   8.166667 3.1028661     A     5
6  11.500000 2.9410882     A     6
7  13.666667 2.3758040     A     7
8   6.833333 1.7779514     A     8
9  11.833333 2.3009660     A     9
10  8.666667 1.7061979     A    10
11  8.333333 1.6865481     B     1
12 12.166667 2.6002137     B     2
13 10.000000 2.7080128     B     3
14 11.833333 3.1242777     B     4
15  4.666667 1.2823589     B     5
16 12.500000 3.0413813     B     6
17  6.000000 1.5055453     B     7
18  8.166667 1.6616591     B     8
19 11.000000 2.6708301     B     9
20 13.166667 0.9457507     B    10

Here is how I would normally do something like this: 这是我通常会做的事情:

set.seed(3949)

data.frame(group = rep(c("A", "B"), each = 6)) %>%
  cbind(replicate(12, sample(1:20, size = 10)) %>% t()) %>%
  split(.$group) %>%
  lapply(function(x) data.frame(mean = colMeans(x[ ,2:11]),
                                se = apply(x[ ,2:11], 2, se))) %>%
  do.call(rbind,.) %>%
  mutate(group = substr(row.names(.), 1,1),
         trial = rep(1:10, 2)) %>% 

  ggplot(aes(x = trial, y = mean)) +
  geom_point() +
  geom_line() +
  facet_grid(~ group) +
  scale_x_continuous(limits = c(1,10), breaks = seq(1, 10, 1)) +
  geom_errorbar(aes(ymin = mean-se, ymax = mean+se), color = "black") + 
  theme_bw()

Is there are cleaner way to do this with the tidyverse functions? tidyverse函数是否有更干净的方法可以做到这一点?

I think that another way is to use nest() and map() . 我认为另一种方法是使用nest()map()

library(tidyverse)
library(plotrix) #For the std.error

# Your second sample dataset
set.seed(3949)
df <- data.frame(group = rep(c("A", "B"), each = 6)) %>%
  cbind(replicate(12, sample(1:20, size = 10)) %>% t()) 


df %>% 
  nest(-group) %>% 
  mutate(mean = map(data, ~rowMeans(.)), 
         se = map(data, ~ plotrix::std.error(t(.))), 
         trial = map(data, ~ seq(1, nrow(.)))) %>%
  unnest(mean, se, trial) %>% 
  ggplot(aes(x = trial, y = mean)) +
  geom_point() +
  geom_line() +
  facet_grid(~ group) +
  geom_errorbar(aes(ymin = mean-se, ymax = mean+se), color = "black") + 
  theme_bw()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM