[英]mutate r conditional with data.frame as filter
I am trying to calculate the probabilities for a very large data set of each id for one month and I came up here in the forum with the "mutate function" however it does not really work the way i want. 我试图计算一个月内每个id的非常大的数据集的概率,我在论坛中使用“mutate函数”来到这里,但它并没有真正按照我想要的方式工作。 My data looks similar to that and I want to calculate the column P.:
我的数据与此类似,我想计算P列:
ID Month Day E P
1 200701 20070101 .3 .333
1 200701 20070102 .5 .333
1 200701 20070105 .5 .333
1 200702 20070106 .6 1
2 200701 20070101 .4 .5
2 200701 20070103 .3 .5
For my trials I have subsetted the ID and Month and then simply used 1/length(df$Month). 对于我的试验,我已经对ID和月进行了子集化,然后简单地使用1 /长度(df $ Month)。 My idea now was to extract all IDs:
我现在的想法是提取所有ID:
u <- subset(df, !duplicated(df$ID))
s <- subset(df, !duplicated(df$Month)) #Month is defined as date variable
and then mutate them with a formular similar to that: 然后用类似的公式改变它们:
mutate(df, p = 1/length(df$ID == u & df$month ==s))
This does not work unfortunatly. 这不幸地不起作用。
Or do I have to do a loop? 或者我必须做一个循环?
using data.table 使用data.table
library(data.table)
setDT(dt)[, P := (1/.N) ,by = c("ID","Month")]
# > dt
# ID Month Day E P
#1: 1 200701 20070101 0.3 0.3333333
#2: 1 200701 20070102 0.5 0.3333333
#3: 1 200701 20070105 0.5 0.3333333
#4: 1 200702 20070106 0.6 1.0000000
#5: 2 200701 20070101 0.4 0.5000000
#6: 2 200701 20070103 0.3 0.5000000
using dplyr : @Sotos has also wrote this answer. 使用dplyr:@Sotos也写了这个答案。 and he wrote it first
他先写了
library(dplyr)
dt %>%
group_by(ID,Month) %>%
mutate(1/n())
# ID Month Day E 1/n()
# (int) (int) (int) (dbl) (dbl)
#1 1 200701 20070101 0.3 0.3333333
#2 1 200701 20070102 0.5 0.3333333
#3 1 200701 20070105 0.5 0.3333333
#4 1 200702 20070106 0.6 1.0000000
#5 2 200701 20070101 0.4 0.5000000
#6 2 200701 20070103 0.3 0.5000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.