简体   繁体   English

在组内排名并将ID保留在R data.table中

[英]Ranking within group and keep id in R data.table

I have a data.table with two grouping variables. 我有一个带有两个分组变量的data.table。 I want to calculate rankings with respect to group variable 1, while still keep the information of group. 我想计算有关组变量1的排名,同时仍保留组的信息。

# require(data.table)
# require(dplyr)

set.seed(1)
DT <- data.table(group = c(rep(1,5), rep(2, 5)), 
                 id = c(letters[1:5], letters[1:5]),
                 var1 = rnorm(10),
                 var2 = runif(10))
# > DT
#      group id       var1       var2
#  1:      1  a -0.6264538 0.93470523
#  2:      1  b  0.1836433 0.21214252
#  3:      1  c -0.8356286 0.65167377
#  4:      1  d  1.5952808 0.12555510
#  5:      1  e  0.3295078 0.26722067
#  6:      2  a -0.8204684 0.38611409
#  7:      2  b  0.4874291 0.01339033
#  8:      2  c  0.7383247 0.38238796
#  9:      2  d  0.5757814 0.86969085
# 10:      2  e -0.3053884 0.34034900

I can calculate the rankings within group using 我可以使用以下方法计算组内的排名

DT[, lapply(.SD, function(x) percent_rank(x)), 
   .SDcols = c("var1", "var2"), by = .(group)]

#     group var1 var2
#  1:     1 0.25 1.00
#  2:     1 0.50 0.25
#  3:     1 0.00 0.75
#  4:     1 1.00 0.00
#  5:     1 0.75 0.50
#  6:     2 0.00 0.75
#  7:     2 0.50 0.00
#  8:     2 1.00 0.50
#  9:     2 0.75 1.00
# 10:     2 0.25 0.25

I would also like to keep the id column in the new table like 我也想将id列保留在新表中,例如

#     group id var1 var2
#  1:     1  A 0.25 1.00
#  2:     1  B 0.50 0.25
#  3:     1  C 0.00 0.75
#  4:     1  D 1.00 0.00
#  5:     1  E 0.75 0.50
#  6:     2  A 0.00 0.75
#  7:     2  B 0.50 0.00
#  8:     2  C 1.00 0.50
#  9:     2  D 0.75 1.00
# 10:     2  E 0.25 0.25

Using data.table 使用data.table

DT[,`:=`(var1 = percent_rank(var1),
        var2 = percent_rank(var2))]

Using dplyr 使用dplyr

DT %>% mutate(var1 = percent_rank(var1),
              var2 = percent_rank(var2))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM