[英]R: Create a column of averages based upon groups of four rows
>head(df)
person week target actual drop_out organization agency
1: QJ1 1 30 19 TRUE BB LLC
2: GJ2 1 30 18 FALSE BB LLC
3: LJ3 1 30 22 TRUE CC BBR
4: MJ4 1 30 24 FALSE CC BBR
5: PJ5 1 35 55 FALSE AA FUN
6: EJ6 1 35 50 FALSE AA FUN
There are around ~30 weeks in the dataset with a repeating Person ID each week. 数据集中大约有30个星期,每个星期都有重复的Person ID。
I want to look at each person's values FOUR weeks at a time (so week 1-4, 5-9, 10-13, and so on). 我想一次查看每个人四个星期的价值观(所以每周1-4、5-9、10-13,依此类推)。 For each of these chunks, I want to add up all the "actual" columns and divide it by the sum of the "target" columns. 对于这些块中的每一个,我都希望将所有“实际”列相加,然后除以“目标”列的总和。 Then we could put that value in a column called "monthly percent." 然后,我们可以将该值放在“每月百分比”列中。
As per Shape's recommendation I've created a month column like so 根据Shape的建议,我创建了一个月份列,如下所示
fullReshapedDT$month <- with(fullReshapedDT, ceiling(week/4))
Trying to figure out how to iterate over the month column and calculate averages now. 尝试找出如何遍历“月”列并立即计算平均值。 Trying something like this, but it obviously doesn't work: 尝试这样的事情,但显然不起作用:
fullReshapedDT[,.(monthly_attendance = actual/target,by=.(person_id, month)]
Have you tried creating a group variable? 您是否尝试过创建组变量? It will allow you to group operations by the four-week period: 它将允许您按四个星期的周期对操作进行分组:
setDT(df1)[,grps:=ceiling(week/4) #Create 4-week groups
][,sum(actual)/sum(target), .(person, grps) #grouped operations
][,grps:=NULL][] #Remove unnecessary columns
# person V1
# 1: QJ1 1.1076923
# 2: GJ2 1.1128205
# 3: LJ3 0.9948718
# 4: MJ4 0.6333333
# 5: PJ5 1.2410256
# 6: EJ6 1.0263158
# 7: QJ1 1.2108108
# 8: GJ2 0.6378378
# 9: LJ3 0.9891892
# 10: MJ4 0.8564103
# 11: PJ5 1.1729730
# 12: EJ6 0.8666667
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.