简体   繁体   English

根据组内排名的R-新变量

[英]R-new variables according to rank within groups

I have such a data frame(df) which is just a sample: 我有这样一个数据框(df),它只是一个示例:

group value condition   
1     12      1
1     14      1
1     18      1
1     10      0
1     7       1
2     12      1
2     9       0
2     12      1
2     16      1
2     15      0

Namely; 即;

df<-data.frame(group=c(1,1,1,1,1,2,2,2,2,2), value=c(12,14,18,10,7,12,9,12,16,15), condition=c(1,1,1,0,1,1,0,1,1,0))

I want to create 3 new colums named "rank1", "rank2" and "rank3" where 我想创建3个名为“rank1”,“rank2”和“rank3”的新列

  • rank1 gives the smallest "value" within the "group"s rank1给出“组”中最小的“值”
  • rank2 gives the second smallest "value" within the "group"s rank2给出“组”中第二个最小的“值”
  • rank3 gives the third smallest "value" within the "group"s rank3给出“组”中第三个最小的“值”
  • within values where condition=1 is satisfied 在满足condition = 1的值内

Namely, desired output is: 即,期望的输出是:

group rank1 rank2 ran3
1     7     12    14
2     12    12    16

How can I do that with R? 我怎么能用R做到这一点? I will be very glad for any help. 我会很高兴得到任何帮助。 Thanks a lot. 非常感谢。

With data.table : 使用data.table

library(data.table)
setDT(df)[condition == 1, 
          setNames(as.list(sort(value)[1:3]), paste0("rank", 1:3)), 
          by = group]
#    group rank1 rank2 rank3
# 1:     1     7    12    14
# 2:     2    12    12    16

Here is one way using dplyr/tidyr 这是使用dplyr/tidyr一种方法

 library(dplyr)
 library(tidyr)
 df %>% 
    group_by(group) %>% 
    filter(condition!=0)
    arrange(value) %>% 
    slice(1:3) %>%
    mutate(n=paste0('rank', row_number())) %>% 
    select(-condition) 
    spread(n, value)
#    group rank1 rank2 rank3
#1     1     7    12    14
#2     2    12    12    16

Or using data.table 或者使用data.table

 library(data.table)
 dcast.data.table(setkey(setDT(df), value)[condition!=0, 
     list(rank=paste0('rank', 1:3), value[1:3]), group], 
           group~rank, value.var='V2')
 #   group rank1 rank2 rank3
 #1:     1     7    12    14
 #2:     2    12    12    16

Or using base R 或使用base R

 df1 <-  subset(df[order(df$value),], condition!=0  , select=1:2)
 df2 <- subset(transform(df1, .id=ave(group, group, FUN=seq_along)), .id<4)
 reshape(df2, idvar='group', timevar='.id', direction='wide')
 #  group value.1 value.2 value.3
 #5     1       7      12      14
 #6     2      12      12      16

Yet another dplyr answer... 另一个dplyr答案......

myData <- read.csv(text=" group,value    
1,12
1,14
1,18
1,10
1,7
2,12
2,9
2,12
2,16 ")

library(dplyr)
myData %>% filter(condition==1) %>% group_by(group) %>% summarise(rank1=nth(sort(value),1),
                                        rank2=nth(sort(value),2),
                                        rank3=nth(sort(value),3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM