简体   繁体   中英

R max function ignore NA

I have below working code. When i replicate same things on a different data set i get errors :(

#max by values
df <- data.frame(age=c(5,NA,9), marks=c(1,2,7), story=c(2,9,NA))
df

df$colMax <- apply(df[,1:3], 1, function(x) max(x[x != 9],na.rm=TRUE))
df

I tried to do the same on a bigger data and I am getting warnings, why?

maindata$max_pc_age <- apply(maindata[,c(paste("Q2",1:18,sep="_"))], 1, function(x) max(x[x != 9],na.rm=TRUE))


50: In max(x[x != 9], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

in order to understand the problem better I made changes as below, but still getting warnings

maindata$max_pc_age <- apply(maindata[,c(paste("Q2",1:18,sep="_"))], 1, function(x) max(x,na.rm=TRUE))
1: In max(x, na.rm = TRUE) : no non-missing arguments to max; returning -Inf

It seems that the problem has been pointed out in the comments already. Since some vectors contain only NA s, -Inf is reported, which I take from the comments you don't like. In this answer I would like to point out one possible way to tackle the issue, namely to built in a control statement (instead of overwritting -Inf after the fact, which is equally valid). For instance,

 my.max <- function(x) ifelse( !all(is.na(x)), max(x, na.rm=T), NA)

does this trick. If every ( all ) element in x is NA , then NA is returned, and the max otherwise. If you want any other value returned, just exchange NA for that value. You can also built this easily into your apply -function. Eg

 maindata$max_pc_age <- apply(maindata[,c(paste("Q2",1:18,sep="_"))], 1, my.max)

I am still sometimes confused by R's NA and empty set treatment. Statements like test <- NA; test==NAtest <- NA; test==NA will give NA as a result (instead of TRUE , as returned by is.na(test) ), which is sometimes rationalized by saying that since the value is missing, how could you know that these two missing values are identical? In this case, however, max returns -Inf since it is given an empty set, which I think is not at all obvious. My experience is though that if strange and unexpected results pop up, NA s or empty sets are often involved.

In cases like below:

df[2,2] <- NA
df[1,2] <- -5

apply(df, 1, function(x) max(x[x != 9],na.rm=TRUE))
#[1]    5 -Inf    7
#Warning message:
#In max(x[x != 9], na.rm = TRUE) :
#  no non-missing arguments to max; returning -Inf

You could do:

df1 <- df  
minVal <- min(df1[!is.na(df1)])-1

df1[is.na(df1)|df1==9] <- minVal
val <- do.call(`pmax`, df1)
val[val==minVal] <- NA
val
#[1]  5 NA  7

You can use hablar::max_ which returns NA if all values are NA

apply(df, 1, function(x) hablar::max_(x[x!=9]))
#[1]  5 NA  7

data

df <- structure(list(age = c(5, NA, 9), marks = c(-5, NA, 7), story = c(2, 
9, NA)), row.names = c(NA, -3L), class = "data.frame")

df
#  age marks story
#1   5    -5     2
#2  NA    NA     9
#3   9     7    NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM