[英]Create a new colum only with non-NA in specific columns [R]
I've got the following data structure...我有以下数据结构...
> data <- data.frame(txt = paste0("f", 1:8),
a = c(NA, NA, NA, "A", "B", "A", NA, "C"),
b = c("D", "A", "C", NA, NA, NA, NA, NA),
c = c(NA, NA, NA, NA, NA, NA, "C", NA))
> data
# txt a b c
# 1 f1 <NA> D <NA>
# 2 f2 <NA> A <NA>
# 3 f3 <NA> C <NA>
# 4 f4 A <NA> <NA>
# 5 f5 B <NA> <NA>
# 6 f6 A <NA> <NA>
# 7 f7 <NA> <NA> C
# 8 f8 C <NA> <NA>
... and I want to create a new column containing the value of these non-NA columns (theoretically, only one col). ...我想创建一个新列,其中包含这些非 NA 列的值(理论上只有一个列)。
> data$tmp <- sapply(1:nrow(data), function(i) gsub("NA", "", paste(as.data.frame(data[i,-1]), collapse = "")))
> data
# txt a b c tmp
# 1 f1 <NA> D <NA> D
# 2 f2 <NA> A <NA> A
# 3 f3 <NA> C <NA> C
# 4 f4 A <NA> <NA> A
# 5 f5 B <NA> <NA> B
# 6 f6 A <NA> <NA> A
# 7 f7 <NA> <NA> C C
# 8 f8 C <NA> <NA> C
This code seems to work as I want, but I have millions of rows and it's soooo slow... any could help me to find a better solution, please?这段代码似乎可以按我的意愿工作,但我有数百万行,而且速度太慢了……有人可以帮我找到更好的解决方案吗? Thanks in advance.
提前致谢。
In case its only one column to select maybe using matrix subset is faster:如果只有一列可供选择,可能使用矩阵子集更快:
data$tmp <- data[matrix(c(seq_len(nrow(data)),
apply(!is.na(data), 1, which.max)), ncol=2)]
Or using the approach from @Ventrilocus或者使用@Ventrilocus 的方法
tt <- t(data)
data$tmp <- tt[!is.na(tt)]
如果每行只有一个非 NA 值,则以下操作有效:
data$tmp = data[!is.na(data)]
Will this work:这是否有效:
library(dplyr)
data %>% mutate(tmp = coalesce(a,b,c))
a b c tmp
1 <NA> D <NA> D
2 <NA> A <NA> A
3 <NA> C <NA> C
4 A <NA> <NA> A
5 B <NA> <NA> B
6 A <NA> <NA> A
7 <NA> <NA> C C
8 C <NA> <NA> C
We can use row/column
indexing with max.col
我们可以在
max.col
使用row/column
索引
data$tmp <- data[-1][cbind(seq_len(nrow(data)), max.col(!is.na(data[-1]), 'first'))]
data$tmp
#[1] "D" "A" "C" "A" "B" "A" "C" "C"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.