简体   繁体   中英

find 75 percentile and replacing by median for each group in R

These problem similar with this my own topic calculation of 90 percentile and replacement of it by median by groups in R

With this distinction that.

But, in that topic Note the calculation is done by 14 zeros preceding the one category of action but replacing by median is done for all zero category of action and performing for each groups code+item

namely ,now i use all zeros and not 14 preceding and don't touch negative and zero values of return.

By group variable (action- 0, 1) for Zero category, i want find 75 percentile by return variable and if value is more than 75 percentile, it must be replaced on median by zero category. So there is code variable This procedure must be performed for code separately. Note: negative and zero value i don't touch

mydat=structure(list(code = c(123L, 123L, 123L, 123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L, 124L, 124L, 124L, 
124L, 124L, 124L, 124L, 124L, 124L), action = c(0L, 0L, 0L, 0L, 
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 1L, 1L, 1L), return = c(-1L, 0L, 23L, 100L, 18L, 15L, -1L, 
0L, 23L, 100L, 18L, 15L, -1L, 0L, 23L, 100L, 18L, 15L, -1L, 0L, 
23L, 100L, 18L, 15L)), .Names = c("code", "action", "return"), class = "data.frame", row.names = c(NA, 
-24L))

\\

23
100
18
15

How to do it to get that output. so 75 percentile:

42,25 The median=20,5 replacement

 add  action   return
123   0    -1
123   0    0
123   0    23
123   0    ***20,5
123   0    18
123   0    15
123   1  -1
123   1  0
123   1  23
123   1  100
123   1  18
123   1  15
124   0    -1
124   0    0
124   0    23
124   0    ***20,5
124   0    18
124   0    15
124   1  -1
124   1  0
124   1  23
124   1  100
124   1  18
124   1  15

Using the greatest Uwe solution, i get the error

Error in `[.data.table`(mydat[action == 0, `:=`(output, as.double(return))],  : 
  Column(s) [action] not found in i

How to do that negative and zero value i don't touch and why this error occured.

library(data.table)
# mark the zero acton rows before the the action period
setDT(mydat)[, zero_before := cummax(action), by = .(code)]
# compute median and 90% quantile for that last 14 rows before each action period 
agg <- mydat[zero_before == 0, 
             quantile(tail(return), c(0.5, 0.75)) %>% 
               as.list()  %>% 
               set_names(c("med", "q90")) %>% 
               c(.(zero_before = 0)), by = .(code)]
agg


# append output column
mydat[action == 0, output := as.double(return)][
  # replace output values greater q90 in an update non-equi join
  agg, on = .(code,action, return > q90), output := as.double(med)][
    # remove helper column
    , zero_before := NULL]

If I understand correctly, the OP wants to compute median and 75% quantile of return within each group based on all zero action rows where the return is greater 0. Then, any return value in a zero action row which exceeds the 75% quantile of the respective group is to be replaced by the group median.

The code can be largely simplified as we do not have to distinghuish between zero action rows before and after the action rows.

The code below reproduces the expected result:

library(data.table)
library(magrittr)
# compute median and 90% quantile for that last 14 rows before each action period 
agg <- setDT(mydat)[action == 0 & return > 0, 
                    quantile(return, c(0.5, 0.75)) %>% 
                      as.list()  %>% 
                      set_names(c("med", "q75")), by = .(code, action)]

# append output column
mydat[, output := as.double(return)][
  # replace output values greater q75 in an update non-equi join
  agg, on = .(code, action, return > q75), output := as.double(med)]
mydat[]
 code action return output 1: 123 0 -1 -1.0 2: 123 0 0 0.0 3: 123 0 23 23.0 4: 123 0 100 20.5 5: 123 0 18 18.0 6: 123 0 15 15.0 7: 123 1 -1 -1.0 8: 123 1 0 0.0 9: 123 1 23 23.0 10: 123 1 100 100.0 11: 123 1 18 18.0 12: 123 1 15 15.0 13: 124 0 -1 -1.0 14: 124 0 0 0.0 15: 124 0 23 23.0 16: 124 0 100 20.5 17: 124 0 18 18.0 18: 124 0 15 15.0 19: 124 1 -1 -1.0 20: 124 1 0 0.0 21: 124 1 23 23.0 22: 124 1 100 100.0 23: 124 1 18 18.0 24: 124 1 15 15.0 code action return output

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM