简体   繁体   English

R:如何对纵向数据进行排名

[英]R: how to rank longitudinal data

> dput(subset)
structure(list(MEMORY1 = c(1L, 1L, 1L, 1L, 2L), MEMORY2 = c(1L, 
1L, 1L, 1L, 1L), MEMORY3 = c(1L, 2L, 1L, 1L, 1L), MEMORY4 = c(2L, 
2L, 2L, 2L, 2L), MEMORY5 = c(1L, 2L, 1L, 2L, 1L), MEMORY6 = c(1L, 
1L, 2L, 1L, 2L), MEMORY7 = c(2L, 2L, 2L, 2L, 1L), MEMORY8 = c(1L, 
1L, 1L, 1L, 1L)), .Names = c("MEMORY1", "MEMORY2", "MEMORY3", 
"MEMORY4", "MEMORY5", "MEMORY6", "MEMORY7", "MEMORY8"), row.names = c(NA, 
-5L), class = "data.frame")

> subset
  MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
1       1       1       1       2       1       1       2       1
2       1       1       2       2       2       1       2       1
3       1       1       1       2       1       2       2       1
4       1       1       1       2       2       1       2       1
5       2       1       1       2       1       2       1       1

My data has 8 items (columns) recorded at 5 time intervals (rows). 我的数据有5个时间间隔(行)记录的8个项目(列)。 I would like to rank the data as follows: 1) if column has all 1s, then the column gets rank 8. 2) rank of the column is dependent upon when a number greater than 1 first appears (for MEMORY1 it would be 5, MEMORY3 is 2, MEMORY4 is 1, and so forth). 我想按以下方式对数据进行排名:1)如果列全为1,则列的排名为8。2)列的排名取决于何时首次出现大于1的数字(对于MEMORY1,它将为5, MEMORY3为2,MEMORY4为1,依此类推。 I wrote the following loop to do this. 我编写了以下循环来做到这一点。

ranks = rep(0, 8)
for(i in 1:8){
  v = which(subset[i] > 1)
  if(length(v) == 0){
    ranks[i] = 8
  }else ranks[i] = v[1]
}
> ranks
[1] 5 8 2 1 2 3 1 8

Works fine but I realized that since there are ties, ie, MEMORY4 and MEMORY7 are both ranked as 1, then I would want MEMORY3 and MEMORY5 to be ranked as 3 instead of 2. In that case MEMORY6 should be ranked as 5, not 3. So the desired ranking should be. 工作正常,但我意识到由于存在联系,即MEMORY4和MEMORY7都排名为1,所以我希望MEMORY3和MEMORY5排名为3而不是2。在这种情况下,MEMORY6应该排名为5,而不是3所以理想的排名应该是。

6 8 3 1 3 5 1 8

One option would be to loop through the columns of 'df1' using sapply and get the first position where the value is greater than 1. If there are no values that are greater than 1, it will be NA . 一种选择是使用sapply循环遍历'df1'的列,并获得该值大于1的第一个位置。如果不存在大于1的值,则为NA Then, we get the rank of the 'indx' specifying the ties.method as min ('indx1'). 然后,我们获得“ indx”的rank ,将ties.method指定为min ('indx1')。 The position of NA values in 'indx' is replaced by 8 as the last step. 最后一步将“ indx”中NA值的位置替换为8。

 indx <- sapply(df1, function(x) which(x>1)[1L])
 indx1 <- as.vector(rank(indx, ties.method='min'))
 indx1[is.na(indx)] <- 8
 indx1
 #[1] 6 8 3 1 3 5 1 8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM