简体   繁体   中英

Add variables whilst ignoring NA`s using transform function

I have a data frame with a large number of variables. I am creating new variables by adding together some of the old ones. The code I am using to do so is:

name_of_data_frame<- transform(name_of_data_frame, new_variable=var1+var2 +....)

When transform comes across a NA in one of the observations, it returns "NA" in the new variable, even if some of the other variables it was adding were not NA.

eg if var1= 4 , var2=3 , var3=NA , then using transform , if I did var1+var2+var3 it would give out NA , whereas I would like it to give me 7.

I don't want to recode my NA s to zero within the data frame, as I may need to refer back to the NA s later, so don't want to confuse the NA s with the observations which were genuinely 0 .

Any help on how to get around R treating NA s in the way described above with the transform function would be great (or if there are alternative functions to use, that would be great also).

Please note that I am not always just summing variables that are next to each other, I am also often dividing variables, multiplying, subtracting etc.

My first instinct was to suggest to use sum() since then you can use the na.rm argument. However, this doesn't work, since sum() reduces it arguments to a single scalar value, not a vector.

This means you need to write a parallel sum function. Let's call this psum() , similar to the base R function pmin() or pmax() :

psum <- function(..., na.rm=FALSE) { 
  x <- list(...)
  rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm)
} 

Now set up some data and use psum() to get the desired vector:

dat <- data.frame(
  x = c(1,2,3, NA),
  y = c(NA, 4, 5, NA))

transform(dat, new=psum(x, y, na.rm=TRUE))
   x  y new
1  1 NA   1
2  2  4   6
3  3  5   8
4 NA NA   0

Similarly, you can define a parallel product , or pprod() like this:

pprod <- function(..., na.rm=FALSE) { 
  x <- list(...)
  m <- matrix(unlist(x), ncol=length(x))
  apply(m, 1, prod, na.rm=TRUE)
} 

transform(dat, new=pprod(x, y, na.rm=TRUE))
   x  y new
1  1 NA   1
2  2  4   8
3  3  5  15
4 NA NA   1

This example of pprod provides a general template for what you want to do: Create a function that uses apply() to summarize a matrix of input into the desired vector.

Using rowSums and prod could help you out.

set.seed(007) # Generating some data
DF <- data.frame(V1=sample(c(50,NA,36,24,80, NA), 15, replace=TRUE),
                 V2=sample(c(70,40,NA,25,100, NA), 15, replace=TRUE),
                 V3=sample(c(20,26,34,15,78,40), 15, replace=TRUE))

transform(DF, Sum=rowSums(DF, na.rm=TRUE)) # Sum (a vector of values)
transform(DF, Prod=apply(DF, 1, FUN=prod, na.rm=TRUE)) # Prod (a vector of values)

# Defining a function for substracting (resta, in spanish :D)
resta <- function(x) Reduce(function(a,b) a-b,  x <- x[!is.na(x)])
transform(DF, Substracting=apply(DF, 1, resta))

# Defining a function for dividing 
div <- function(x) Reduce(function(a,b) a/b,  x <- x[!is.na(x)])
transform(DF, Divsion=apply(DF, 1, div))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM