[英]Creating percentile categories in R
我有以下数据:
len = 1000
vint1 = vint1=sample(1:150,len,replace=TRUE)
vch1=sample(LETTERS[1:5],len,replace=TRUE)
vbin1=sample(letters[1:2],len,replace=TRUE)
mydf = data.frame(vint1, vch1, vbin1)
但是我必须再创建一列“类别”,其中应根据以下规则包含条目:
'N' if < 90th percentile
'cat1' if >=90th and <95th percentile
'cat2' if >=95th and <99th percentile
'cat3' if >99th percentile
始终检查该vch1和vbin1的百分位数。
我可以通过以下代码确定该组vch1和vch2的vint1值是否> 90%。
with(mydf, ave(vint1, vch1, vbin1, FUN=function(x) x>quantile(x,0.9)))
但是我该如何分类?
编辑:
我尝试了以下代码。 想确认是否可以或是否有更好的方法:
with(mydf, ave(vint1, vch1, vbin1, FUN=function(x)
ifelse(x<quantile(x,0.9), 'N',
ifelse(x<quantile(x,0.95),'cat1',
ifelse(x<(quantile(x,0.99)),'cat2','cat3'
)))
)
)
这是来自以下问题的后续问题: 按R中的百分比对数据帧进行分类
这可能会有所帮助。 使用group_by
, ntile
中的dplyr
和您的ifelse
语句,我提出了以下内容。
library(dplyr)
group_by(mydf, vch1, vbin1) %>%
mutate(check = ntile(vint1, 100),
out = ifelse(check > 99, "cat3",
ifelse(between(check, 95, 99), "cat2",
ifelse(between(check, 90, 95), "cat1", "N")))) %>%
ungroup()
# A part of the outcome
# vint1 vch1 vbin1 check out
#1 138 C b 88 N
#2 66 B a 39 N
#3 24 D a 16 N
#4 141 B a 90 cat1
#5 27 C a 13 N
#6 29 C a 16 N
#7 11 D b 4 N
#8 24 B b 21 N
#9 72 E a 46 N
#10 25 C b 15 N
理念
transform(mydf,
check = ave(vint1, vch1, vbin1, FUN=function(x){
ifelse(x<quantile(x,0.9), 'N',
ifelse(x<quantile(x,0.95),'cat1',
ifelse(x<(quantile(x,0.99)),'cat2','cat3'
)))
})
)
# vint1 vch1 vbin1 check
#1 90 D b N
#2 136 C b cat1
#3 55 B a N
#4 56 B b N
#5 56 D a N
#6 100 A b N
我希望这可以帮助
group_percentile <- do.call ( rbind, with( mydf, tapply(vint1, interaction(vch1 , vbin1), quantile, probs=seq(0,1,.01 ) ) ))# create quatile for each group of "vch1" and "vbin1"
group_percentile<- data.frame(group_percentile)#Change this to data, X0. is the percentile
group_percentile$group <- rownames(group_percentile)#Create a new column
group_percentile2 <- melt(group_percentile)#Change the data to wide format so we can used this for lookup
mydf["group"] <- NA
mydf$group <- paste(mydf$vch,".",mydf$vbin1,sep='')#Create new column for combination of "vch1""vbin1"
mydf2 <- merge(mydf,group_percentile2,by.x=c("vint1","group"),by.y=c("value","group"))#Create new column for percentile
> head(mydf2)#Variable is the percentile
vint1 group vch1 vbin1 variable
1 1 B.a B a X0.
2 1 D.a D a X0.
3 1 D.a D a X1.
4 1 D.a D a X0.
5 1 D.a D a X1.
6 1 D.b D b X0.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.