简体   繁体   中英

Using apply function on a matrix with NA entries

I read Data from a csv file. If I see this file in R, I have:

  V1 V2  V3 V4  V5 V6 V7
1 14 25  83 64 987 45 78
2 15 65 789 32  14 NA NA
3 14 67  89 14  NA NA NA

If I want the maximum value in each column, I use this:

apply(df,2,max)

and this is the result:

 V1  V2  V3  V4  V5  V6  V7 
 15  67 789  64  NA  NA  NA 

but it works on the column that has no NA . How can I change my code, to compare columns with NA too?

You just need to add na.rm=TRUE to your apply call.

apply(df,2,max,na.rm=TRUE)

Note: This does assume every column has at least one data point. If one does not sum will return 0 .

EDIT BASED ON COMMENT

fft does not have an na.rm argument. Therefore, you will need to write your own function.

apply(df,2,function(x){fft(x[!is.na(x)])})

For example:

df <- data.frame(matrix(5,5,5))
df[,3] <- NA

> df
  X1 X2 X3 X4 X5
1  5  5 NA  5  5
2  5  5 NA  5  5
3  5  5 NA  5  5
4  5  5 NA  5  5
5  5  5 NA  5  5

> apply(df,2,function(x){fft(x[!is.na(x)])})
$X1
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

$X2
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

$X3
complex(0)

$X4
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

$X5
[1] 2.500000e+01+0i 1.776357e-15+0i 1.776357e-15+0i 1.776357e-15+0i
[5] 1.776357e-15+0i

Another option:

sapply(apply(df,2,na.exclude), fft)

EDIT: the code above may fail if apply() returns a matrix instead of a list. And this will happen if there are no NA s for instance. The code below fixes that:

sapply(tapply(m, col(m), na.exclude), max)

Interesting, there is no need to set simplify=FALSE , as the result of tapply() will only be simplified if na.exclude() returns a single scalar per column; and in this case sapply works in the same way.

Another option, this will return -Inf if all elements of col are NA

df<-structure(list(x = c(10, 12, 13), y = c(12, 13, NA), z = c(NA_real_, 
NA_real_, NA_real_)), .Names = c("x", "y", "z"), row.names = c(NA, 
-3L), class = "data.frame")

kk<-Map(function(x) max(na.omit(df[,x])),as.list(names(df)))
ll<-do.call(rbind,kk)
rownames(ll)<-names(df)

> ll

 [,1]
x   13
y   13
z -Inf

This might be a result of a posterior version but you could actually do:

apply(df,2,function(x) max(x,na.rm=T))

which will return you a vector or equivalently:

lapply(df,function(x) max(x,na.rm=T))

which will return you a list. Notice that whenever one of the columns in df is a character it will fail returning all NA's. In this case you may need to do a prior select of the objective variables.

Another option is to use the following:

apply(na.omit(df),2,max)

na.omit(df) will simply remove incomplete cases from each column of your data frame df and then the apply() function will yield the max value for each of the columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM