简体   繁体   中英

Data Table Solution In R To Find Group Min/Max

Data

data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
                "score"=c(NA,7,6,6,1,4,8,NA,3),
                "min"=c(6,6,6,1,1,1,3,3,3),
                "max"=c(7,7,7,6,6,6,8,8,8))

I have columns 'student' and 'score' and wish to use data.table to create 'min' and 'max' which simply put are the minimum and maximum values for each student IGNORING NA Values. If all values are NA then simply list 'NA' as the min/max.

Using data.table

library(data.table)

setDT(data)
data[, c("min", "max"):= list(min(score, na.rm = TRUE), 
                              max(score,  na.rm = TRUE)), student]
data
#   student score min max
#1:       1    NA   6   7
#2:       1     7   6   7
#3:       1     6   6   7
#4:       2     6   1   6
#5:       2     1   1   6
#6:       2     4   1   6
#7:       3     8   3   8
#8:       3    NA   3   8
#9:       3     3   3   8

Or with dplyr

library(dplyr)
data %>%
  group_by(student) %>%
  mutate(min =  min(score, na.rm = TRUE), max = max(score, na.rm = TRUE))

But OP wanted to return NA if all scores for any student were NA. This solution fixes the Inf problem.

data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
                "score"=c(NA,NA,NA,6,1,4,8,NA,3))
> dt <- data.table(data); dt
   student score
1:       1    NA
2:       1    NA
3:       1    NA
4:       2     6
5:       2     1
6:       2     4
7:       3     8
8:       3    NA
9:       3     3

Create a function to handle the case where all values are NA, to return NA

min.na = function(x) if (all(is.na(x))) x[NA_integer_] else min(x, na.rm = TRUE)
max.na = function(x) if (all(is.na(x))) x[NA_integer_] else max(x, na.rm = TRUE)

dt[, c("min", "max") := list(min.na(score), max.na(score)), by=student]
dt
   student score min max
1:       1    NA  NA  NA
2:       1    NA  NA  NA
3:       1    NA  NA  NA
4:       2     6   1   6
5:       2     1   1   6
6:       2     4   1   6
7:       3     8   3   8
8:       3    NA   3   8
9:       3     3   3   8

Edit : And I'm not sure why you'd want to do this anyway. Combining summary statistics to the original data is bad practice. It results in redundancy/duplication. Surely you just want a separate result for each student:

dt[, .(min=min.na(score), max=max.na(score)), by=student]
   student min max
1:       1  NA  NA
2:       2   1   6
3:       3   3   8

I know this last part is not what was asked, but I always check that what they ask for is what they really wanted. ;)

另一个data.table选项:

setDT(data)[, c("min","max") := as.list(range(score, na.rm=TRUE)), student]

You can do this using the function ave :

data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
                "score"=c(NA,7,6,6,1,4,8,NA,3))

data$min = ave(data$score, data$student, FUN = function(x){ min(x, na.rm = T) })
data$max = ave(data$score, data$student, FUN = function(x){ max(x, na.rm = T) })

Result:

> data
  student score min max
1       1    NA   6   7
2       1     7   6   7
3       1     6   6   7
4       2     6   1   6
5       2     1   1   6
6       2     4   1   6
7       3     8   3   8
8       3    NA   3   8
9       3     3   3   8

The function ave takes a numeric vector as the first parameter and all following vectors are the grouping variables. The FUN parameter is the function you wish to apply.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM