[英]How can I replace NAs in multiple columns with the values from other corresponding multiple columns in the same data set?
I am trying replace NA's over multiple columns with corresponding values from other columns in the df.我正在尝试用 df 中其他列的相应值替换多个列上的 NA。
df = data.frame(ID = sample(1000:9999,10),
Age = sample(18:99,10),
Gender = sample(c("M","F"),10, replace = TRUE),
Test1 = sample(60:100,10),
Test2 = sample(60:100,10),
Test3 = sample(60:100,10),
Test1.x = rep(NA,10),
Test2.x = rep(NA,10),
Test3.x = rep(NA,10))
df$Test1[c(2,3,8)] = NA
df$Test2[c(4,10)] = NA
df$Test3[c(1,7)] = NA
df$Test1.x[c(2,3,4,8)] = sample(60:100,4)
df$Test2.x[c(4,9,10)] = sample(60:100,3)
df$Test3.x[c(1,6,7)] = sample(60:100,3)
print(df)
ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1 7877 40 M 78 70 NA NA NA 84
2 6345 54 F NA 99 61 62 NA NA
3 9170 41 F NA 80 96 82 NA NA
4 2400 83 M 100 NA 100 94 95 NA
5 5920 66 M 77 62 69 NA NA NA
6 2569 34 M 99 96 81 NA NA 100
7 7879 28 M 64 71 NA NA NA 90
8 8652 53 F NA 74 89 95 NA NA
9 6357 97 F 92 86 83 NA 86 NA
10 1943 45 M 95 NA 98 NA 72 NA
I would like to replace only the NAs in the test scores with the corresponding test.x score, while using str_replace.在使用 str_replace 时,我只想用相应的 test.x 分数替换测试分数中的 NA。 My actual data frame contain more than 3 columns but all the corresponding column names are the same with the ".x" afterwards.
我的实际数据框包含超过 3 列,但所有相应的列名都与之后的“.x”相同。
Any ideas to make this quick and easy?有什么想法可以让这个过程变得快速简单吗? I'm struggling between mutating across said columns or using replace_nas.
我在跨所述列突变或使用 replace_nas 之间挣扎。
Within dplyr
we could use coalesce
with across
. across
dplyr
,我们可以使用coalesce
和 cross 。
library(dplyr)
df |>
mutate(across(starts_with("Test") & !contains(".x"),
~ coalesce(., get(paste0(cur_column(), ".x")))))
Output: Output:
ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1 5022 90 M 94 68 79 NA NA 79
2 1625 41 M 71 66 89 71 NA NA
3 6438 86 M 86 94 94 86 NA NA
4 3249 93 F 74 90 76 68 90 NA
5 7338 70 F 64 63 70 NA NA NA
6 9416 27 F 78 74 75 NA NA 64
7 4374 45 F 82 100 60 NA NA 60
8 6226 21 F 61 82 63 61 NA NA
9 5265 97 M 83 83 68 NA 89 NA
10 5441 95 M 70 79 99 NA 79 NA
Using dplyover
使用
dplyover
library(dplyover)
df <- df %>%
mutate(across2(matches("Test\\d+$"), ends_with(".x"),
coalesce, .names = "{xcol}"))
-output -输出
df
ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1 7877 40 M 78 70 84 NA NA 84
2 6345 54 F 62 99 61 62 NA NA
3 9170 41 F 82 80 96 82 NA NA
4 2400 83 M 100 95 100 94 95 NA
5 5920 66 M 77 62 69 NA NA NA
6 2569 34 M 99 96 81 NA NA 100
7 7879 28 M 64 71 90 NA NA 90
8 8652 53 F 95 74 89 95 NA NA
9 6357 97 F 92 86 83 NA 86 NA
10 1943 45 M 95 72 98 NA 72 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.