These problem similar with this my own topic calculation of 90 percentile and replacement of it by median by groups in R
With this distinction that.
But, in that topic Note the calculation is done by 14 zeros preceding the one category of action but replacing by median is done for all zero category of action and performing for each groups code+item
namely ,now i use all zeros and not 14 preceding and don't touch negative and zero values of return.
By group variable (action- 0, 1) for Zero
category, i want find 75 percentile by return variable and if value is more than 75 percentile, it must be replaced on median by zero
category. So there is code
variable This procedure must be performed for code separately. Note: negative and zero value i don't touch
mydat=structure(list(code = c(123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L, 124L, 124L, 124L,
124L, 124L, 124L, 124L, 124L, 124L), action = c(0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L), return = c(-1L, 0L, 23L, 100L, 18L, 15L, -1L,
0L, 23L, 100L, 18L, 15L, -1L, 0L, 23L, 100L, 18L, 15L, -1L, 0L,
23L, 100L, 18L, 15L)), .Names = c("code", "action", "return"), class = "data.frame", row.names = c(NA,
-24L))
\\
23
100
18
15
How to do it to get that output. so 75 percentile:
42,25 The median=20,5 replacement
add action return
123 0 -1
123 0 0
123 0 23
123 0 ***20,5
123 0 18
123 0 15
123 1 -1
123 1 0
123 1 23
123 1 100
123 1 18
123 1 15
124 0 -1
124 0 0
124 0 23
124 0 ***20,5
124 0 18
124 0 15
124 1 -1
124 1 0
124 1 23
124 1 100
124 1 18
124 1 15
Using the greatest Uwe solution, i get the error
Error in `[.data.table`(mydat[action == 0, `:=`(output, as.double(return))], :
Column(s) [action] not found in i
library(data.table)
# mark the zero acton rows before the the action period
setDT(mydat)[, zero_before := cummax(action), by = .(code)]
# compute median and 90% quantile for that last 14 rows before each action period
agg <- mydat[zero_before == 0,
quantile(tail(return), c(0.5, 0.75)) %>%
as.list() %>%
set_names(c("med", "q90")) %>%
c(.(zero_before = 0)), by = .(code)]
agg
# append output column
mydat[action == 0, output := as.double(return)][
# replace output values greater q90 in an update non-equi join
agg, on = .(code,action, return > q90), output := as.double(med)][
# remove helper column
, zero_before := NULL]
If I understand correctly, the OP wants to compute median and 75% quantile of return
within each group based on all zero action rows where the return is greater 0. Then, any return value in a zero action row which exceeds the 75% quantile of the respective group is to be replaced by the group median.
The code can be largely simplified as we do not have to distinghuish between zero action rows before and after the action rows.
The code below reproduces the expected result:
library(data.table)
library(magrittr)
# compute median and 90% quantile for that last 14 rows before each action period
agg <- setDT(mydat)[action == 0 & return > 0,
quantile(return, c(0.5, 0.75)) %>%
as.list() %>%
set_names(c("med", "q75")), by = .(code, action)]
# append output column
mydat[, output := as.double(return)][
# replace output values greater q75 in an update non-equi join
agg, on = .(code, action, return > q75), output := as.double(med)]
mydat[]
code action return output 1: 123 0 -1 -1.0 2: 123 0 0 0.0 3: 123 0 23 23.0 4: 123 0 100 20.5 5: 123 0 18 18.0 6: 123 0 15 15.0 7: 123 1 -1 -1.0 8: 123 1 0 0.0 9: 123 1 23 23.0 10: 123 1 100 100.0 11: 123 1 18 18.0 12: 123 1 15 15.0 13: 124 0 -1 -1.0 14: 124 0 0 0.0 15: 124 0 23 23.0 16: 124 0 100 20.5 17: 124 0 18 18.0 18: 124 0 15 15.0 19: 124 1 -1 -1.0 20: 124 1 0 0.0 21: 124 1 23 23.0 22: 124 1 100 100.0 23: 124 1 18 18.0 24: 124 1 15 15.0 code action return output
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.