简体   繁体   中英

R: Select only one element in each row that meets a specific criteria

I have a data frame that looks like this

x <- data.frame("a.1" = c(NA, NA, 101, 101, NA),
                "a.2" = c(NA, NA, 101, NA, NA),
                "a.3" = c(101, NA, NA, NA, 103),
                "a.4" = c(NA, NA , NA, NA, 103))

Each row contains either NA 's and/or some 10x value. This value is unique to each row, such that one row cannot contain eg 101 and 103 at the same time.

Now I want to create a column containing the value found in each row, irrespective of whether it appears one or many times. Each row that has only NA should also have NA . In my case this should look like this

   new column
1  101
2  NA
3  101
4  101
5  103

Any idea how I can do this in an efficient way ! My orginial data frame is pretty large so I'd like to avoid computationally expensive for -loops and murky ifelse statements.

Thanks in advance

EDIT:

Using rowMeans() is a pretty nice solution as @akrun pointed out. However, in my orginial data set, the values 101, 102, ... are in fact character strings indicating some industry. I could, of course, convert them via as.numeric , yet I have some industry indicators with leading zeros such as 013 , 0201 etc. Those zeros get killed (logically) when converted to numeric, hence I cannot convert them.

What to do in this case?

We can use pmax

 x$newcolumn <- do.call(pmax, c(x, list(na.rm=TRUE)))
 x$newcolumn
 #[1] 101  NA 101 101 103

Or another option is rowMeans as there is only a single unique element in a row.

rowMeans(x, na.rm=TRUE)

Update

If the columns are character class and don't want to convert to numeric , one option is max.col

x1[cbind(1:nrow(x1),max.col(!is.na(x1), 'first'))]
#[1] "012" NA    "012" "011" "011"

The pmax approach should also work

do.call(pmax, c(x1, na.rm=TRUE))
#[1] "012" NA    "012" "011" "011"

data

x1 <- data.frame(a.1 = c(NA, NA, '012', '011', NA),
            a.2 = c(NA, NA, '012', NA, NA),
            a.3 = c('012', NA, NA, NA, '011'),
            a.4 = c(NA, NA , NA, NA, '011'), stringsAsFactors=FALSE)

Okay i found a solution using apply , lapply and `ifelse`` statement...not as clean as I would like it but its reasonably fast and works

 x1 <- data.frame(a.1 = c(NA, NA, '012', '011', NA),
        a.2 = c(NA, NA, '012', NA, NA),
        a.3 = c('012', NA, NA, NA, '011'),
        a.4 = c(NA, NA , NA, NA, '011'), stringsAsFactors=FALSE)

new.column  <- x1 %>% 
   apply(1, function(i) unique(i[!is.na(i)])) %>% 
   lapply(function(i) ifelse(length(i) == 0, NA, i)) %>% 
   unlist()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM