[英]Categorize dataframe by percentile in R
I have following data: 我有以下数据:
set.seed(15)
ddf <- data.frame(
gp1 = sample(1:3, 200, replace=T),
gp2 = sample(c('a','b'), 200, replace=T),
param = sample(10:20, 200, replace=T)
)
head(ddf)
gp1 gp2 param
1 2 a 18
2 1 b 11
3 3 a 15
4 2 b 20
5 2 a 17
6 3 b 11
I have to create another column called 'category' which needs to have a value of 1 if 'param' for that row is more than 75th percentile for that gp1 and gp2. 我必须创建另一个名为“类别”的列,如果该行的“参数”大于该gp1和gp2的第75个百分点,则该列的值必须为1。
I tried following but I am not sure if this is correct: 我尝试了以下操作,但不确定是否正确:
ddf$category = with(ddf, ifelse(param>quantile(ddf[ddf$gp1==gp1 & ddf$gp2==gp2,]$param, .75, na.rm=T), 1, 0) )
Is above code correct or else how can this be done? 上面的代码正确吗,否则该怎么办? Thanks for your help. 谢谢你的帮助。
(After changing "value" to "param") (将“值”更改为“参数”后)
ddf = data.frame(gp1, gp2, param)
ddf$category <- with(ddf, ave(param, gp1,gp2,
FUN=function(x) x > quantile(x,.95) ) )
> ddf
gp1 gp2 param category
1 2 a 20 0
2 2 a 16 0
3 1 a 12 0
4 1 b 16 0
5 3 b 19 0
snipped
> sum(ddf$category)
[1] 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.