I have a data frame
that looks like this
x <- data.frame("a.1" = c(NA, NA, 101, 101, NA),
"a.2" = c(NA, NA, 101, NA, NA),
"a.3" = c(101, NA, NA, NA, 103),
"a.4" = c(NA, NA , NA, NA, 103))
Each row contains either NA
's and/or some 10x
value. This value is unique to each row, such that one row cannot contain eg 101
and 103
at the same time.
Now I want to create a column containing the value found in each row, irrespective of whether it appears one or many times. Each row that has only NA
should also have NA
. In my case this should look like this
new column
1 101
2 NA
3 101
4 101
5 103
Any idea how I can do this in an efficient way ! My orginial data frame is pretty large so I'd like to avoid computationally expensive for
-loops and murky ifelse
statements.
Thanks in advance
EDIT:
Using rowMeans()
is a pretty nice solution as @akrun pointed out. However, in my orginial data set, the values 101, 102, ... are in fact character strings indicating some industry. I could, of course, convert them via as.numeric
, yet I have some industry indicators with leading zeros such as 013
, 0201
etc. Those zeros get killed (logically) when converted to numeric, hence I cannot convert them.
What to do in this case?
We can use pmax
x$newcolumn <- do.call(pmax, c(x, list(na.rm=TRUE)))
x$newcolumn
#[1] 101 NA 101 101 103
Or another option is rowMeans
as there is only a single unique element in a row.
rowMeans(x, na.rm=TRUE)
If the columns are character
class and don't want to convert to numeric
, one option is max.col
x1[cbind(1:nrow(x1),max.col(!is.na(x1), 'first'))]
#[1] "012" NA "012" "011" "011"
The pmax
approach should also work
do.call(pmax, c(x1, na.rm=TRUE))
#[1] "012" NA "012" "011" "011"
x1 <- data.frame(a.1 = c(NA, NA, '012', '011', NA),
a.2 = c(NA, NA, '012', NA, NA),
a.3 = c('012', NA, NA, NA, '011'),
a.4 = c(NA, NA , NA, NA, '011'), stringsAsFactors=FALSE)
Okay i found a solution using apply
, lapply
and `ifelse`` statement...not as clean as I would like it but its reasonably fast and works
x1 <- data.frame(a.1 = c(NA, NA, '012', '011', NA),
a.2 = c(NA, NA, '012', NA, NA),
a.3 = c('012', NA, NA, NA, '011'),
a.4 = c(NA, NA , NA, NA, '011'), stringsAsFactors=FALSE)
new.column <- x1 %>%
apply(1, function(i) unique(i[!is.na(i)])) %>%
lapply(function(i) ifelse(length(i) == 0, NA, i)) %>%
unlist()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.