如何用同一数据集中其他相应多列的值替换多列中的 NA？

Question

I am trying replace NA's over multiple columns with corresponding values from other columns in the df.我正在尝试用 df 中其他列的相应值替换多个列上的 NA。


df = data.frame(ID = sample(1000:9999,10),
                Age = sample(18:99,10),
                Gender = sample(c("M","F"),10, replace = TRUE),              
                Test1 = sample(60:100,10),
                Test2 = sample(60:100,10),
                Test3 = sample(60:100,10),
                Test1.x = rep(NA,10),
                Test2.x = rep(NA,10),
                Test3.x = rep(NA,10))

df$Test1[c(2,3,8)] = NA
df$Test2[c(4,10)] = NA
df$Test3[c(1,7)] = NA
df$Test1.x[c(2,3,4,8)] = sample(60:100,4)
df$Test2.x[c(4,9,10)] = sample(60:100,3)
df$Test3.x[c(1,6,7)] = sample(60:100,3)
print(df)
    ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1  7877  40      M    78    70    NA      NA      NA      84
2  6345  54      F    NA    99    61      62      NA      NA
3  9170  41      F    NA    80    96      82      NA      NA
4  2400  83      M   100    NA   100      94      95      NA
5  5920  66      M    77    62    69      NA      NA      NA
6  2569  34      M    99    96    81      NA      NA     100
7  7879  28      M    64    71    NA      NA      NA      90
8  8652  53      F    NA    74    89      95      NA      NA
9  6357  97      F    92    86    83      NA      86      NA
10 1943  45      M    95    NA    98      NA      72      NA

I would like to replace only the NAs in the test scores with the corresponding test.x score, while using str_replace.在使用 str_replace 时，我只想用相应的 test.x 分数替换测试分数中的 NA。 My actual data frame contain more than 3 columns but all the corresponding column names are the same with the ".x" afterwards.我的实际数据框包含超过 3 列，但所有相应的列名都与之后的“.x”相同。

Any ideas to make this quick and easy?有什么想法可以让这个过程变得快速简单吗？ I'm struggling between mutating across said columns or using replace_nas.我在跨所述列突变或使用 replace_nas 之间挣扎。

Answer 1

Within dplyr we could use coalesce with across . across dplyr ，我们可以使用coalesce和 cross 。

library(dplyr)

df |>
  mutate(across(starts_with("Test") & !contains(".x"),
                ~ coalesce(., get(paste0(cur_column(), ".x")))))

Output: Output：

     ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1  5022  90      M    94    68    79      NA      NA      79
2  1625  41      M    71    66    89      71      NA      NA
3  6438  86      M    86    94    94      86      NA      NA
4  3249  93      F    74    90    76      68      90      NA
5  7338  70      F    64    63    70      NA      NA      NA
6  9416  27      F    78    74    75      NA      NA      64
7  4374  45      F    82   100    60      NA      NA      60
8  6226  21      F    61    82    63      61      NA      NA
9  5265  97      M    83    83    68      NA      89      NA
10 5441  95      M    70    79    99      NA      79      NA

Answer 2

Using dplyover使用dplyover

library(dplyover)
df <- df %>% 
   mutate(across2(matches("Test\\d+$"), ends_with(".x"),
       coalesce, .names = "{xcol}"))

-output -输出

df
     ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1  7877  40      M    78    70    84      NA      NA      84
2  6345  54      F    62    99    61      62      NA      NA
3  9170  41      F    82    80    96      82      NA      NA
4  2400  83      M   100    95   100      94      95      NA
5  5920  66      M    77    62    69      NA      NA      NA
6  2569  34      M    99    96    81      NA      NA     100
7  7879  28      M    64    71    90      NA      NA      90
8  8652  53      F    95    74    89      95      NA      NA
9  6357  97      F    92    86    83      NA      86      NA
10 1943  45      M    95    72    98      NA      72      NA

如何用同一数据集中其他相应多列的值替换多列中的 NA？

问题描述

2 个解决方案

解决方案1
1 2022-09-01 18:11:39

解决方案2
0 已采纳 2022-09-01 18:31:55

如何用同一数据集中其他相应多列的值替换多列中的 NA？

问题描述

2 个解决方案

解决方案1 1 2022-09-01 18:11:39

解决方案2 0 已采纳 2022-09-01 18:31:55

解决方案1
1 2022-09-01 18:11:39

解决方案2
0 已采纳 2022-09-01 18:31:55