R：根据之前和id的数据动态聚合

Question

我有一个很大的数据集，看起来像这样简化：

row.    member_id   entry_id    comment_count   timestamp
1       1            a              4           2008-06-09 12:41:00
2       1            b              1           2008-07-14 18:41:00
3       1            c              3           2008-07-17 15:40:00
4       2            d              12          2008-06-09 12:41:00
5       2            e              50          2008-09-18 10:22:00
6       3            f              0           2008-10-03 13:36:00

现在，我想创建一个新列，其中总结了同一成员的所有先前创意（“ ID”）的“ commen_count”。 因此，我只想总结当前条目之前发生的输入的comment_counts。 我可以按会员ID和时间戳排序我的数据集。

结果应如下所示：

row.    member_id   entry_id    comments_count  timestamp             aggregated_count
1       1            a              4           2008-06-09 12:41:00        4
2       1            b              1           2008-07-14 18:41:00        5
3       1            c              3           2008-07-17 15:40:00        8
4       2            d              12          2008-06-09 12:41:00        12
5       2            e              50          2008-09-18 10:22:00        62
6       3            f              0           2008-10-03 13:36:00        0

知道如何在R（或Stata）中执行此操作吗？ 我尝试了聚合，但是我不明白如何只对当前时间戳之前的comment_counts和具有当前member_id的comment_counts求和。

Answer 1

试试这个（假设df是您的数据）

transform(df, aggregated_count = ave(comments_count, member_id, FUN = cumsum))
#   member_id entry_id comments_count           timestamp aggregated_count
# 1         1        a              4 2008-06-09 12:41:00                4
# 2         1        b              1 2008-07-14 18:41:00                5
# 3         1        c              3 2008-07-17 15:40:00                8
# 4         2        d             12 2008-06-09 12:41:00               12
# 5         2        e             50 2008-09-18 10:22:00               62
# 6         3        f              0 2008-10-03 13:36:00                0

一些其他方式（为提高效率而引入）：

library(data.table)
setDT(df)[, aggregated_count := cumsum(comments_count), member_id]

要么

library(dplyr)
df %>%
  group_by(member_id) %>%
  mutate(aggregated_count = cumsum(comments_count))

Answer 2

使用Stata：

clear
set more off

*----- example data -----

input ///
row    member_id   str1 entry_id    comment_count   str30 timestamp
1       1            a              4           2008-06-09 12:41:00
2       1            b              1           2008-07-14 18:41:00
3       1            c              3           2008-07-17 15:40:00
4       2            d              12          2008-06-09 12:41:00
5       2            e              50          2008-09-18 10:22:00
6       3            f              0           2008-10-03 13:36:00
end

list

*----- what you want -----

bysort member_id: gen s = sum(comment_count)

list

这仅涉及by:前缀的使用。

R：根据之前和id的数据动态聚合

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-12-16 22:51:58

解决方案2
2 2014-12-16 22:57:53

R：根据之前和id的数据动态聚合

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-12-16 22:51:58

解决方案2 2 2014-12-16 22:57:53

解决方案1
2 已采纳 2014-12-16 22:51:58

解决方案2
2 2014-12-16 22:57:53