[英]R: Combining several character columns into one by replacing NA-rows
I have a data frame consisting of character variables which looks like this:我有一个由字符变量组成的数据框,如下所示:
V1 V2 V3 V4 V5
1 ID Date pic1 pic2 pic3
2 1 15.06.16 11:50 abc <NA> def
3 1 16.06.16 11:19 <NA> hij <NA>
4 1 17.06.16 11:41 <NA> <NA> nop
5 2 28.05.16 11:40 tuv <NA> <NA>
6 2 29.05.16 11:39 <NA> zab <NA>
7 2 30.05.16 09:07 <NA> <NA> wxy
8 3 03.06.16 07:31 lmn <NA> <NA>
9 3 04.06.16 11:01 <NA> rst <NA>
10 3 05.06.16 13:57 <NA> <NA> opq
So on each day one of the pic-variables contains a value, the rest is NA.因此,每天其中一个 pic 变量包含一个值,其余为 NA。 Now I want to combine all pic-values into one variable by replacing the NA's.现在我想通过替换 NA 将所有图片值组合成一个变量。 Sorry if this is a dublicate, I've already tried a lot of suggested solutions but nothing has worked so far.对不起,如果这是重复的,我已经尝试了很多建议的解决方案,但到目前为止没有任何效果。 Thanks!谢谢!
We can try with data.table
.我们可以尝试使用data.table
。 Convert the 'data.frame' to 'data.table' ( setDT(df1)
, grouped by 'ID', and 'Date', we unlist
the Subset of Data.table ( .SD
) and omit the NA elements ( na.omit
)将“data.frame”转换为“data.table”( setDT(df1)
,按“ID”和“Date”分组,我们unlist
的子集( .SD
)并省略 NA 元素( na.omit
)
library(data.table)
setDT(df1)[, .(pic = na.omit(unlist(.SD))), by = .(ID, Date)]
# ID Date pic
# 1: 1 15.06.16 11:50 abc
# 2: 1 15.06.16 11:50 def
# 3: 1 16.06.16 11:19 hij
# 4: 1 17.06.16 11:41 nop
# 5: 2 28.05.16 11:40 tuv
# 6: 2 29.05.16 11:39 zab
# 7: 2 30.05.16 09:07 wxy
# 8: 3 03.06.16 07:31 lmn
# 9: 3 04.06.16 11:01 rst
#10: 3 05.06.16 13:57 opq
Or another option is pmax
if there is only a single non-NA per row或者另一个选项是pmax
如果每行只有一个非 NA
setDT(df1)[, pic := do.call(pmax, c(.SD, na.rm = TRUE)),
.SDcols = pic1:pic3][, paste0("pic", 1:3) := NULL][]
Or using dplyr
或者使用dplyr
library(dplyr)
df1 %>%
mutate(pic = pmax(pic1, pic2, pic3, na.rm=TRUE))%>%
select(-(pic1:pic3))
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Date = c("15.06.16 11:50",
"16.06.16 11:19", "17.06.16 11:41", "28.05.16 11:40", "29.05.16 11:39",
"30.05.16 09:07", "03.06.16 07:31", "04.06.16 11:01", "05.06.16 13:57"
), pic1 = c("abc", NA, NA, "tuv", NA, NA, "lmn", NA, NA), pic2 = c(NA,
"hij", NA, NA, "zab", NA, NA, "rst", NA), pic3 = c("def", NA,
"nop", NA, NA, "wxy", NA, NA, "opq")), .Names = c("ID", "Date",
"pic1", "pic2", "pic3"), row.names = c(NA, -9L), class = "data.frame")
Assuming假设
on each day one of the pic-variables contains a value, the rest is
NA
每天一个 pic 变量包含一个值,其余的为NA
You can use coalesce
from dplyr
to get what you want:您可以使用coalesce
从dplyr
得到你想要的东西:
library(dplyr)
result <- df1 %>% mutate(pic = coalesce(pic1, pic2, pic3)) %>%
select(-(pic1:pic3))
With the data supplied by akrun:使用 akrun 提供的数据:
print(result)
## ID Date pic
##1 1 15.06.16 11:50 abc
##2 1 16.06.16 11:19 hij
##3 1 17.06.16 11:41 nop
##4 2 28.05.16 11:40 tuv
##5 2 29.05.16 11:39 zab
##6 2 30.05.16 09:07 wxy
##7 3 03.06.16 07:31 lmn
##8 3 04.06.16 11:01 rst
##9 3 05.06.16 13:57 opq
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.