简体   繁体   English

在R中创建百分位类别

[英]Creating percentile categories in R

I have following data: 我有以下数据:

len = 1000
vint1 = vint1=sample(1:150,len,replace=TRUE)
vch1=sample(LETTERS[1:5],len,replace=TRUE)
vbin1=sample(letters[1:2],len,replace=TRUE)
mydf = data.frame(vint1, vch1, vbin1)

But I have to make another column 'category' which should have entries according to following rules: 但是我必须再创建一列“类别”,其中应根据以下规则包含条目:

'N' if < 90th percentile 
'cat1' if >=90th and <95th percentile
'cat2' if >=95th and <99th percentile
'cat3' if >99th percentile

Percentile is always checked for that vch1 and vbin1. 始终检查该vch1和vbin1的百分位数。

I can determine if vint1 value is > 90th percentile for that group of vch1 and vch2 by following code: 我可以通过以下代码确定该组vch1和vch2的vint1值是否> 90%。

with(mydf, ave(vint1, vch1, vbin1, FUN=function(x) x>quantile(x,0.9)))

But how can I make categories? 但是我该如何分类?

EDIT: 编辑:

I tried following code. 我尝试了以下代码。 Want to confirm if it is OK or is there some better method: 想确认是否可以或是否有更好的方法:

with(mydf, ave(vint1, vch1, vbin1, FUN=function(x) 
    ifelse(x<quantile(x,0.9), 'N',
    ifelse(x<quantile(x,0.95),'cat1',
    ifelse(x<(quantile(x,0.99)),'cat2','cat3'
    )))
    )
)

This is a follow up question from: Categorize dataframe by percentile in R 这是来自以下问题的后续问题: 按R中的百分比对数据帧进行分类

This may be helpful. 这可能会有所帮助。 Using group_by , ntile in dplyr , and your ifelse statement, I came up with the following. 使用group_byntile中的dplyr和您的ifelse语句,我提出了以下内容。

library(dplyr)

group_by(mydf, vch1, vbin1) %>%
mutate(check = ntile(vint1, 100),
       out = ifelse(check > 99, "cat3",
                 ifelse(between(check, 95, 99), "cat2",
                    ifelse(between(check, 90, 95), "cat1", "N")))) %>%
ungroup()

# A part of the outcome
#   vint1 vch1 vbin1 check  out
#1    138    C     b    88    N
#2     66    B     a    39    N
#3     24    D     a    16    N
#4    141    B     a    90 cat1
#5     27    C     a    13    N
#6     29    C     a    16    N
#7     11    D     b     4    N
#8     24    B     b    21    N
#9     72    E     a    46    N
#10    25    C     b    15    N

IDEA 理念

transform(mydf,
          check = ave(vint1, vch1, vbin1, FUN=function(x){
                       ifelse(x<quantile(x,0.9), 'N',
                       ifelse(x<quantile(x,0.95),'cat1',
                       ifelse(x<(quantile(x,0.99)),'cat2','cat3'
                    )))  
                  })
          )

#  vint1 vch1 vbin1 check
#1    90    D     b     N
#2   136    C     b  cat1
#3    55    B     a     N
#4    56    B     b     N
#5    56    D     a     N
#6   100    A     b     N

I hope this can help 我希望这可以帮助

group_percentile <- do.call ( rbind, with( mydf, tapply(vint1, interaction(vch1 , vbin1), quantile, probs=seq(0,1,.01 ) ) ))# create quatile for each group of "vch1" and "vbin1"


group_percentile<- data.frame(group_percentile)#Change this to data, X0. is the percentile

group_percentile$group <- rownames(group_percentile)#Create a new column


group_percentile2 <- melt(group_percentile)#Change the data to wide format so we can used this for lookup


mydf["group"] <- NA
mydf$group <- paste(mydf$vch,".",mydf$vbin1,sep='')#Create new column for combination of "vch1""vbin1"

 mydf2 <- merge(mydf,group_percentile2,by.x=c("vint1","group"),by.y=c("value","group"))#Create new column for percentile

 > head(mydf2)#Variable is the percentile
  vint1 group vch1 vbin1 variable
1     1   B.a    B     a      X0.
2     1   D.a    D     a      X0.
3     1   D.a    D     a      X1.
4     1   D.a    D     a      X0.
5     1   D.a    D     a      X1.
6     1   D.b    D     b      X0.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM