[英]Replace NA in a Dataframe Column with a Value Only when Two Other Columns Are Also NA
library(tidyverse) 库(tidyverse)
Using the sample data below, I'm trying to replace the NA's in Col1 with the number 22222, but only when all three columns are NA. 使用下面的示例数据,我试图将Col1中的NA替换为数字22222,但前提是所有三列均为NA。 So the final result should only have 22222 in Col1 for rows 4 and 7. 因此,最终结果在第1行和第4行和第7行中应该只有22222。
I would like to use tidyverse and I'm attempting something along the lines of: 我想使用tidyverse,并且正在尝试以下方法:
DF%>%mutate_at(vars(Col1),funs(replace(.,if_else(is.na(one_of(Col1,Col2,Col3),22222,.)))))
Sample Data:( not sure if this is the correct way to create real "NA"'s (that work with is.na) in the sample data? My real data has blank cells in the Excel file, which when converted to CSV and imported to R results in NA's .) 示例数据:( 不确定是否是在示例数据中创建真实的“ NA”(与is.na一起使用)的正确方法?我的真实数据在Excel文件中具有空白单元格,当转换为CSV和导入到R会得出NA的结果 。)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
Col1<-c(34564,NA,43456,NA,45655,6789,NA,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
DF<-data_frame(ID,Col1,Col2,Col3)
One solution could be to use mapply
function. 一种解决方案是使用mapply
函数。
#Define a function to replace missing row values
replMissing <- function(x, y, z){
ifelse(is.na(x) & is.na(y) & is.na(z), 22222, x )
}
# Call mapply and pass value of Col1, Col2 and Col3
DF$Col1 <- mapply(replMissing, DF$Col1, DF$Col2, DF$Col3)
#results
> DF
# A tibble: 8 x 4
ID Col1 Col2 Col3
<dbl> <dbl> <dbl> <dbl>
1 34564 34564 34565 45673
2 NA NA 43456 88789
3 43456 43456 55555 11123
4 NA 22222 NA NA
5 45655 45655 65433 55676
6 6789 6789 22234 76566
7 99999 22222 NA NA
8 87667 87667 98909 NA
The solution will be much simpler using data.table
. 使用data.table
解决方案将简单data.table
。
DF <- data.table(DF)
DF[is.na(Col1) & is.na(Col2) & is.na(Col3), Col1 := 22222]
# Result
> DF
ID Col1 Col2 Col3
1: 34564 34564 34565 45673
2: NA NA 43456 88789
3: 43456 43456 55555 11123
4: NA 22222 NA NA
5: 45655 45655 65433 55676
6: 6789 6789 22234 76566
7: 99999 22222 NA NA
8: 87667 87667 98909 NA
Your question has a few errors, so my answer will attempt to fill in the blanks. 您的问题有一些错误,因此我的答案将尝试填补空白。 The data frame you have provided does not contain id
, for example. 例如,您提供的数据框不包含id
。 I have modified your sample to make this reproducible. 我已修改您的样本以使其可重现。
library(dplyr)
df <- tibble(
id = c(34564, NA, 43456, NA, 45655, 6789, 99999, 87667),
col1 = c(45673, 88789, 11123, NA, 55676, 76566, NA, NA),
col2 = c(34564, NA, 43456, NA, 45655, 6789, NA, 87667),
col3 = c(34565, 43456, 55555, NA, 65433, 22234, NA, 98909)
)
To solve a single column, you can just use if/else in a normal mutate. 要解决一列问题,您只需在常规突变中使用if / else即可。
df %>%
mutate(col1 = if_else(
is.na(col1) & is.na(col2) & is.na(col3), 22222, col1
))
# # A tibble: 8 x 4
# id col1 col2 col3
# <dbl> <dbl> <dbl> <dbl>
# 1 34564 45673 34564 34565
# 2 NA 88789 NA 43456
# 3 43456 11123 43456 55555
# 4 NA 22222 NA NA
# 5 45655 55676 45655 65433
# 6 6789 76566 6789 22234
# 7 99999 22222 NA NA
# 8 87667 NA 87667 98909
Your question suggests you actually want each column mutated, not only col1
. 您的问题建议您实际上希望每个列都发生突变,而不仅仅是col1
。 You can substitute the funs(replace())
approach you started with to just apply the earlier if/else to each column listed in vars
. 您可以替代开始时使用的funs(replace())
方法,仅将较早的if / else应用到vars
列出的每个列。
df %>%
mutate_at(
vars(col1, col2, col3),
~if_else(is.na(df$col1) & is.na(df$col2) & is.na(df$col3), 22222, .)
)
# # A tibble: 8 x 4
# id col1 col2 col3
# <dbl> <dbl> <dbl> <dbl>
# 1 34564 45673 34564 34565
# 2 NA 88789 NA 43456
# 3 43456 11123 43456 55555
# 4 NA 22222 22222 22222
# 5 45655 55676 45655 65433
# 6 6789 76566 6789 22234
# 7 99999 22222 22222 22222
# 8 87667 NA 87667 98909
This solution works for any number of column. 此解决方案适用于任意数量的列。 It will replace the value with 22222
for each rows that is all NA
value in each column 对于每一行,它将用22222
替换该值,这是每一列中的所有NA
值
library(dplyr, warn.conflicts = FALSE)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
DF<-data_frame(Col1,Col2,Col3)
# Find the rows with all NA. Works with any number of column
all_na <- DF %>%
is.na() %>%
apply(1, all)
# Replace the value from this rows with 2222 and keep others
DF %>%
mutate_all(funs(if_else(all_na, 22222, .)))
#> # A tibble: 8 x 3
#> Col1 Col2 Col3
#> <dbl> <dbl> <dbl>
#> 1 34564 34565 45673
#> 2 NA 43456 88789
#> 3 43456 55555 11123
#> 4 22222 22222 22222
#> 5 45655 65433 55676
#> 6 6789 22234 76566
#> 7 99999 NA NA
#> 8 87667 98909 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.