Add variables whilst ignoring NA`s using transform function

Question

I have a data frame with a large number of variables. I am creating new variables by adding together some of the old ones. The code I am using to do so is:

name_of_data_frame<- transform(name_of_data_frame, new_variable=var1+var2 +....)

When transform comes across a NA in one of the observations, it returns "NA" in the new variable, even if some of the other variables it was adding were not NA.

eg if var1= 4 , var2=3 , var3=NA , then using transform , if I did var1+var2+var3 it would give out NA , whereas I would like it to give me 7.

I don't want to recode my NA s to zero within the data frame, as I may need to refer back to the NA s later, so don't want to confuse the NA s with the observations which were genuinely 0 .

Any help on how to get around R treating NA s in the way described above with the transform function would be great (or if there are alternative functions to use, that would be great also).

Please note that I am not always just summing variables that are next to each other, I am also often dividing variables, multiplying, subtracting etc.

Answer 1

My first instinct was to suggest to use sum() since then you can use the na.rm argument. However, this doesn't work, since sum() reduces it arguments to a single scalar value, not a vector.

This means you need to write a parallel sum function. Let's call this psum() , similar to the base R function pmin() or pmax() :

psum <- function(..., na.rm=FALSE) { 
  x <- list(...)
  rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm)
}

Now set up some data and use psum() to get the desired vector:

dat <- data.frame(
  x = c(1,2,3, NA),
  y = c(NA, 4, 5, NA))

transform(dat, new=psum(x, y, na.rm=TRUE))
   x  y new
1  1 NA   1
2  2  4   6
3  3  5   8
4 NA NA   0

Similarly, you can define a parallel product , or pprod() like this:

pprod <- function(..., na.rm=FALSE) { 
  x <- list(...)
  m <- matrix(unlist(x), ncol=length(x))
  apply(m, 1, prod, na.rm=TRUE)
} 

transform(dat, new=pprod(x, y, na.rm=TRUE))
   x  y new
1  1 NA   1
2  2  4   8
3  3  5  15
4 NA NA   1

This example of pprod provides a general template for what you want to do: Create a function that uses apply() to summarize a matrix of input into the desired vector.

Answer 2

Using rowSums and prod could help you out.

set.seed(007) # Generating some data
DF <- data.frame(V1=sample(c(50,NA,36,24,80, NA), 15, replace=TRUE),
                 V2=sample(c(70,40,NA,25,100, NA), 15, replace=TRUE),
                 V3=sample(c(20,26,34,15,78,40), 15, replace=TRUE))

transform(DF, Sum=rowSums(DF, na.rm=TRUE)) # Sum (a vector of values)
transform(DF, Prod=apply(DF, 1, FUN=prod, na.rm=TRUE)) # Prod (a vector of values)

# Defining a function for substracting (resta, in spanish :D)
resta <- function(x) Reduce(function(a,b) a-b,  x <- x[!is.na(x)])
transform(DF, Substracting=apply(DF, 1, resta))

# Defining a function for dividing 
div <- function(x) Reduce(function(a,b) a/b,  x <- x[!is.na(x)])
transform(DF, Divsion=apply(DF, 1, div))

Add variables whilst ignoring NA`s using transform function

Question

2 answers

solution1
10 ACCPTED 2012-08-27 10:33:37

solution2
2 2012-08-27 11:10:18

Add variables whilst ignoring NA`s using transform function

Question

2 answers

solution1 10 ACCPTED 2012-08-27 10:33:37

solution2 2 2012-08-27 11:10:18

solution1
10 ACCPTED 2012-08-27 10:33:37

solution2
2 2012-08-27 11:10:18