简体   繁体   English

使用NA分配组内的值的等级

[英]Assigning rank of values within groups with NAs

I have such a data frame(df) which is just a sapmle: 我有这样一个数据框(df),这只是一个sapmle:

group value
1     12.1
1     10.3
1     NA
1     11.0
1     13.5
2     11.7
2     NA
2     10.4
2     9.7

Namely, 也就是说,

df<-data.frame(group=c(1,1,1,1,1,2,2,2,2), value=c(12.1, 10.3, NA, 11.0, 13.5, 11.7, NA, 10.4, 9.7))

Desired output is: 期望的输出是:

group value  order
1     12.1    3
1     10.3    1
1     NA      NA
1     11.0    2
1     13.5    4
2     11.7    3
2     NA      NA
2     10.4    2
2     9.7     1

Namely, I want to find the 也就是说,我想找到

  • rank of the "value"s from starting from the smallest value 从最小值开始的“值”的等级
  • within the "group"s. 在“团体”中。

How can I do that with R? 我怎么能用R做到这一点? I will be very glad for any help Thanks a lot. 我会很高兴得到任何帮助非常感谢。

We could use ave from base R to create the rank column ("order1") of "value" by "group". 我们可以使用来自base R ave来创建“group”的“value”的rank列(“order1”)。 If we need to have NAs for corresponding NA in "value" column, this can be done ( df$order[is.na(..)] ) 如果我们需要NAs相应NA在“值”一栏,这是可以做到( df$order[is.na(..)]

 df$order1 <- with(df, ave(value, group, FUN=rank))
 df$order1[is.na(df$value)] <- NA

Or using data.table 或者使用data.table

 library(data.table)
 setDT(df)[, order1:=rank(value)* NA^(is.na(value)), by = group][]
 #    group value order1
 #1:     1  12.1      3
 #2:     1  10.3      1
 #3:     1    NA     NA
 #4:     1  11.0      2
 #5:     1  13.5      4
 #6:     2  11.7      3
 #7:     2    NA     NA
 #8:     2  10.4      2
 #9:     2   9.7      1

You can use the rank() function applied to each group at a time to get your desired result. 您可以一次使用应用于每个组的rank()函数来获得所需的结果。 My solution for doing this is to write a small helper function and call that function in a for loop. 我这样做的解决方案是编写一个小辅助函数并在for循环中调用该函数。 I'm sure there are other more elegant means using various R libraries but here is a solution using only base R. 我确信使用各种R库还有其他更优雅的方法,但这里只是使用基础R的解决方案。

df <- read.table('~/Desktop/stack_overflow28283818.csv', sep = ',', header = T)

#helper function
    rankByGroup <- function(df = NULL, grp = 1)
                {
                rank(df[df$group == grp, 'value'])
                }


# Remove NAs
df.na <- df[is.na(df$value),]
df.0  <- df[!is.na(df$value),]

# For loop over groups to list the ranks
for(grp in unique(df.0$group))
    {
    df.0[df.0$group == grp, 'order'] <- rankByGroup(df.0, grp) 
    print(grp)
    }

# Append NAs
df.na$order <- NA
df.out <- rbind(df.0,df.na)

#re-sort for ordering given in OP (probably not really required)
df.out <- df.out[order(as.numeric(rownames(df.out))),]

This gives exactly the output desired, although I suspect that maintaining the position of the NAs in the data may not be necessary for your application. 这准确地给出了所需的输出,但我怀疑在您的应用中可能不需要保持数据中的NA的位置。

> df.out
  group value order
1     1  12.1     3
2     1  10.3     1
3     1    NA    NA
4     1  11.0     2
5     1  13.5     4
6     2  11.7     3
7     2    NA    NA
8     2  10.4     2
9     2   9.7     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM