简体   繁体   中英

mutate r conditional with data.frame as filter

I am trying to calculate the probabilities for a very large data set of each id for one month and I came up here in the forum with the "mutate function" however it does not really work the way i want. My data looks similar to that and I want to calculate the column P.:

ID Month Day       E  P
1 200701 20070101 .3 .333
1 200701 20070102 .5 .333
1 200701 20070105 .5 .333
1 200702 20070106 .6 1
2 200701 20070101 .4 .5
2 200701 20070103 .3 .5

For my trials I have subsetted the ID and Month and then simply used 1/length(df$Month). My idea now was to extract all IDs:

u <- subset(df, !duplicated(df$ID))
s <- subset(df, !duplicated(df$Month)) #Month is defined as date variable

and then mutate them with a formular similar to that:

mutate(df, p =  1/length(df$ID == u & df$month ==s))

This does not work unfortunatly.

Or do I have to do a loop?

using data.table

library(data.table)
setDT(dt)[, P := (1/.N) ,by = c("ID","Month")]
# > dt
#   ID  Month      Day   E         P
#1:  1 200701 20070101 0.3 0.3333333
#2:  1 200701 20070102 0.5 0.3333333
#3:  1 200701 20070105 0.5 0.3333333
#4:  1 200702 20070106 0.6 1.0000000
#5:  2 200701 20070101 0.4 0.5000000
#6:  2 200701 20070103 0.3 0.5000000

using dplyr : @Sotos has also wrote this answer. and he wrote it first

library(dplyr)
dt %>% 
  group_by(ID,Month) %>% 
  mutate(1/n())

#     ID  Month      Day     E     1/n()
#  (int)  (int)    (int) (dbl)     (dbl)
#1     1 200701 20070101   0.3 0.3333333
#2     1 200701 20070102   0.5 0.3333333
#3     1 200701 20070105   0.5 0.3333333
#4     1 200702 20070106   0.6 1.0000000
#5     2 200701 20070101   0.4 0.5000000
#6     2 200701 20070103   0.3 0.5000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM