如何将缺少数据的多列合并为一列？

Question

I have a (very messy) dataset that is the product of the merging of a few datasets.我有一个（非常混乱的）数据集，它是几个数据集合并的产物。 It currently looks like this:它目前看起来像这样：

   Study_ID Status Death_status death
1       100      1           NA    NA
2       200      1           NA    NA
3       300      0           NA    NA
4       400     NA            0    NA
5       500     NA            1    NA
6       600     NA            0    NA
7       700     NA           NA     0
8       800     NA           NA     1
9       900     NA           NA     1
10     1000     NA           NA     0

I would like to create a new column that combines all 3 of the columns for each patient.我想创建一个新列，将每个患者的所有 3 个列组合在一起。

My desired output would look something like this:我想要的输出看起来像这样：

   Study_ID New_Death_Status Status Death_status death
1       100                1      1           NA    NA
2       200                1      1           NA    NA
3       300                0      0           NA    NA
4       400                0     NA            0    NA
5       500                1     NA            1    NA
6       600                0     NA            0    NA
7       700                0     NA           NA     0
8       800                1     NA           NA     1
9       900                1     NA           NA     1
10     1000                0     NA           NA     0

Where New_Death_Status has a full set of data for every patient.其中 New_Death_Status 为每位患者提供全套数据。

How can I go about doing this?我该怎么做呢？

Reproducible data:可重现的数据：

data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900","1000"),Status=c("1","1","0","NA","NA","NA","NA","NA","NA","NA"),Death_status=c("NA","NA","NA","0","1","0","NA","NA","NA","NA"),death=c("NA","NA","NA","NA","NA","NA","0","1","1","0"))
> data

Answer 1

Assuming we know that know patient will have more than one column (meaning we can safely ignore everything after the first column with non- NA data), then we can coalesce it.假设我们知道know patient 将有不止一列（这意味着我们可以安全地忽略第一列之后的所有非NA数据），那么我们可以coalesce它。

However ... your data has literal "NA" strings instead of the reserveds symbol NA .但是...您的数据具有文字"NA"字符串而不是保留符号NA 。 I think that may be a mistake in your processing somewhere, so I'll "fix" them to be NA (and still strings):我认为这可能是您在某处处理的错误，所以我会将它们“修复”为NA （仍然是字符串）：

library(dplyr)
data %>%
  mutate(
    # this step just replaces the literal "NA" with the symbol NA
    across(c(Status, Death_status, death), ~ if_else(. == "NA", .[NA], .)),
    New_Death_Status = coalesce(Status, Death_status, death)
  )
#    Study_ID Status Death_status death New_Death_Status
# 1       100      1         <NA>  <NA>                1
# 2       200      1         <NA>  <NA>                1
# 3       300      0         <NA>  <NA>                0
# 4       400   <NA>            0  <NA>                0
# 5       500   <NA>            1  <NA>                1
# 6       600   <NA>            0  <NA>                0
# 7       700   <NA>         <NA>     0                0
# 8       800   <NA>         <NA>     1                1
# 9       900   <NA>         <NA>     1                1
# 10     1000   <NA>         <NA>     0                0

coalesce returns the first non- NA found among its vector arguments, so it will silently discard any subsequent non- NA value present. coalesce返回在其向量参数中找到的第一个非NA ，因此它将默默地丢弃任何后续存在的非NA值。 Also, it will complain if all of the classes are not the same;此外，如果所有类都不相同，它会抱怨； they are all strings here, but if your processing is changing classes它们在这里都是字符串，但是如果您的处理正在更改类

如何将缺少数据的多列合并为一列？

问题描述

1 个解决方案

解决方案1
0 2022-07-18 20:29:48

如何将缺少数据的多列合并为一列？

问题描述

1 个解决方案

解决方案1 0 2022-07-18 20:29:48

解决方案1
0 2022-07-18 20:29:48