简体   繁体   English

对R中的多个变量进行排名的排名函数

[英]Rank function to rank multiple variables in R

I am trying to rank multiple numeric variables ( around 700+ variables) in the data and am not sure exactly how to do this as I am still pretty new to using R. 我正在尝试在数据中对多个数值变量(大约700多个变量)进行排名,并且不确定确切如何执行此操作,因为我对使用R还是很陌生。

I do not want to overwrite the ranked values in the same variable and hence need to create a new rank variable for each of these numeric variables. 我不想覆盖同一变量中的排名值,因此需要为每个这些数字变量创建一个新的排名变量。

From reading the posts, I believe assign and transform function along with rank maybe able to solve this. 通过阅读帖子,我相信分配和转换功能以及排名也许可以解决此问题。 I tried implementing as below ( sample data and code) and am struggling to get it to work. 我尝试按以下方式实现(示例数据和代码),并且努力使其正常工作。

The output dataset in addition to variables xcount, xvisit, ysales need to be populated With variables xcount_rank, xvisit_rank, ysales_rank containing the ranked values. 除变量xcount,xvisit,ysales外,还需要使用包含排名值的变量xcount_rank,xvisit_rank,ysales_rank填充输出数据集。

input <- read.table(header=F, text="101 2 5 6 
                102 3 4 7 
                103 9 12 15")
colnames(input) <- c("id","xcount","xvisit","ysales")

input1 <- input[,2:4] #need to rank the numeric variables besides id

for (i in 1:3) 
{
  transform(input1, 
            assign(paste(input1[,i],"rank",sep="_")) = 
              FUN = rank(-input1[,i], ties.method = "first"))
}






input[paste(names(input)[2:4], "rank", sep = "_")] <- 
     lapply(input[2:4], cut, breaks = 10)

The problem with this approach is that it's creating the rank values as (101, 230] , (230, 450] etc whereas I would like to see the values in the rank variable to be populated as 1, 2 etc up to 10 categories as per the splits I did. Is there any way to achieve this? input[5:7] <- lapply(input[5:7], rank, ties.method = "first") 这种方法的问题在于,它正在将排名值创建为(101,230],(230,450]等,而我希望看到rank变量中的值将填充为1、2等,最多为10个类别,例如根据我所做的拆分,有什么方法可以实现?input [5:7] <-lapply(input [5:7],rank,ties.method =“ first”)

The approach I tried from the solutions provided below is: 我从下面提供的解决方案中尝试的方法是:

   input <- read.table(header=F, text="101 20 5 6 
                102 2 4 7 
                103 9 12 15
                104 100 8 7 
                105 450 12 65 
                109 25 28 145
                112 854 56 93")
   colnames(input) <- c("id","xcount","xvisit","ysales")

   input[paste(names(input)[2:4], "rank", sep = "_")] <- 
           lapply(input[2:4], cut, breaks = 3)

   Current output I get is:
   id xcount xvisit ysales xcount_rank xvisit_rank ysales_rank
    1 101     20      5      6  (1.15,286] (3.95,21.3] (5.86,52.3]
    2 102      2      4      7  (1.15,286] (3.95,21.3] (5.86,52.3]
    3 103      9     12     15  (1.15,286] (3.95,21.3] (5.86,52.3]
    4 104    100      8      7  (1.15,286] (3.95,21.3] (5.86,52.3]
    5 105    450     12     65   (286,570] (3.95,21.3] (52.3,98.7]
    6 109     25     28    145  (1.15,286] (21.3,38.7]  (98.7,145]
    7 112    854     56     93   (570,855] (38.7,56.1] (52.3,98.7]

    Desired output:
     id xcount xvisit ysales xcount_rank xvisit_rank ysales_rank
     1 101     20      5      6  1           1           1
     2 102      2      4      7  1           1           1
     3 103      9     12     15  1           1           1
     4 104    100      8      7  1           1           1
     5 105    450     12     65  2           1           2
     6 109     25     28    145  1           2           3

Would like to see the records in the group they would fall under if I try to rank the interval values. 如果我尝试对时间间隔值进行排名,希望查看它们所属的组中的记录。

Using dplyr 使用dplyr

 library(dplyr)
  nm1 <- paste("rank", names(input)[2:4], sep="_")
  input[nm1] <-  mutate_each(input[2:4],funs(rank(., ties.method="first")))
  input
  #   id xcount xvisit ysales rank_xcount rank_xvisit rank_ysales
  #1 101      2      5      6           1           2           1
  #2 102      3      4      7           2           1           2
  #3 103      9     12     15           3           3           3

Update 更新

Based on the new input and using cut 基于new输入并使用cut

  input[nm1] <- mutate_each(input[2:4], funs(cut(., breaks=3, labels=FALSE)))
  input
  #   id xcount xvisit ysales rank_xcount rank_xvisit rank_ysales
  #1 101     20      5      6           1           1           1
  #2 102      2      4      7           1           1           1
  #3 103      9     12     15           1           1           1
  #4 104    100      8      7           1           1           1
  #5 105    450     12     65           2           1           2
  #6 109     25     28    145           1           2           3
  #7 112    854     56     93           3           3           2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM