R：如何计算一组内的均值/标准差，总是逐行添加

Question

I want to find out how stable group averages get, when adding more observations.我想知道在添加更多观察值时，组平均值的稳定性如何。

Let's say I have the following data:假设我有以下数据：

             email score
             <chr> <int>
 1 abc@example.com     4
 2 abc@example.com     3
 3 abc@example.com     3
 4 abc@example.com     4
 5 xyz@example.com     1
 6 xyz@example.com     4
 7 xyz@example.com     5
 8 xyz@example.com     5

Then, for the two different groups (abc@example.com, xyz@example.com) I want to calculate mean & sd row by row, adding one row each.然后，对于两个不同的组（abc@example.com、xyz@example.com），我想逐行计算均值和标准差，每组添加一行。 So, for row 2 it should be: mean(4,3), sd(4,3) - for row 3: mean(4,3,3), sd(4,3,3) and so on...因此，对于第 2 行，它应该是： mean(4,3), sd(4,3) - 对于第 3 行： mean(4,3,3), sd(4,3,3)等等......

The desired output for this example would be would be:此示例所需的输出将是：

            email score     mean        sd
            <chr> <int>    <dbl>     <dbl>
1 abc@example.com     4 4.000000        NA
2 abc@example.com     3 3.500000 0.7071068
3 abc@example.com     3 3.333333 0.5773503
4 abc@example.com     4 3.500000 0.5773503
5 xyz@example.com     1 1.000000        NA
6 xyz@example.com     4 2.500000 2.1213203
7 xyz@example.com     5 3.333333 2.0816660
8 xyz@example.com     5 3.750000 1.8929694

How do I implement this is R?我如何实现这是R？ Thanks谢谢

Answer 1

This might work for you这可能对你有用

Your data您的数据

df <- read.table(text="email score
 1 abc@example.com     4
 2 abc@example.com     3
 3 abc@example.com     3
 4 abc@example.com     4
 5 xyz@example.com     1
 6 xyz@example.com     4
 7 xyz@example.com     5
 8 xyz@example.com     5", header=TRUE)

Solution解决方案

library(tidyverse)
df %>%
  group_by(email) %>%
  nest(score) %>%
  mutate(data = map(data, ~map_df(seq_len(nrow(.x)), function(i) tibble(mean = mean(.x$score[1:i]), sd = sd(.x$score[1:i]))))) %>%
  unnest(data)

Output输出

# A tibble: 8 x 3
            # email     mean        sd
           # <fctr>    <dbl>     <dbl>
# 1 abc@example.com 4.000000        NA
# 2 abc@example.com 3.500000 0.7071068
# 3 abc@example.com 3.333333 0.5773503
# 4 abc@example.com 3.500000 0.5773503
# 5 xyz@example.com 1.000000        NA
# 6 xyz@example.com 2.500000 2.1213203
# 7 xyz@example.com 3.333333 2.0816660
# 8 xyz@example.com 3.750000 1.8929694

Answer 2

If these are ordered observations, rep() your way through a group variable and then aggregate it.如果这些是有序的观察，rep() 你的方式通过一个组变量，然后聚合它。 It'd be easier if you had a proper reprex but I'll try to work with your example:如果您有适当的 reprex 会更容易，但我会尝试使用您的示例：

df$group <- rep(a:b, n)  # where a:b represents how many observations you have for each unique id and n is how many unique ids are in the dataset // this would be rep(1:4, 2) in your example

temp1 <- aggregate(df, list(group), FUN=mean)  # aggregate to get mean
temp2 <- aggregate(df, list(group), FUN=sd)  # aggregate to get sd

out <- data.frame(unique(df$email))
out <- merge(out, temp1, by.x="email", by.y="email")
out <- merge(out, temp2, by.x="email", by.y="email")

This isn't perfect nor is it particularly lean but the logic should help you with solving your problem.这并不完美，也不是特别精简，但逻辑应该可以帮助您解决问题。

R：如何计算一组内的均值/标准差，总是逐行添加

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-10-24 14:04:45

解决方案2
0 2017-10-24 14:12:25

R：如何计算一组内的均值/标准差，总是逐行添加

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-10-24 14:04:45

解决方案2 0 2017-10-24 14:12:25

解决方案1
2 已采纳 2017-10-24 14:04:45

解决方案2
0 2017-10-24 14:12:25