简体   繁体   English

仅当另外两个列也为NA时,才用值替换Dataframe列中的NA

[英]Replace NA in a Dataframe Column with a Value Only when Two Other Columns Are Also NA

library(tidyverse) 库(tidyverse)

Using the sample data below, I'm trying to replace the NA's in Col1 with the number 22222, but only when all three columns are NA. 使用下面的示例数据,我试图将Col1中的NA替换为数字22222,但前提是所有三列均为NA。 So the final result should only have 22222 in Col1 for rows 4 and 7. 因此,最终结果在第1行和第4行和第7行中应该只有22222。

I would like to use tidyverse and I'm attempting something along the lines of: 我想使用tidyverse,并且正在尝试以下方法:

DF%>%mutate_at(vars(Col1),funs(replace(.,if_else(is.na(one_of(Col1,Col2,Col3),22222,.)))))

Sample Data:( not sure if this is the correct way to create real "NA"'s (that work with is.na) in the sample data? My real data has blank cells in the Excel file, which when converted to CSV and imported to R results in NA's .) 示例数据:( 不确定是否是在示例数据中创建真实的“ NA”(与is.na一起使用)的正确方法?我的真实数据在Excel文件中具有空白单元格,当转换为CSV和导入到R会得出NA的结果 。)

Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
Col1<-c(34564,NA,43456,NA,45655,6789,NA,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
DF<-data_frame(ID,Col1,Col2,Col3)

One solution could be to use mapply function. 一种解决方案是使用mapply函数。

#Define a function to replace missing row values
replMissing <- function(x, y, z){
  ifelse(is.na(x) & is.na(y) & is.na(z), 22222, x )
}
# Call mapply and pass value of Col1, Col2 and Col3
DF$Col1 <- mapply(replMissing, DF$Col1, DF$Col2, DF$Col3)

#results
> DF
# A tibble: 8 x 4
     ID  Col1  Col2  Col3
  <dbl> <dbl> <dbl> <dbl>
1 34564 34564 34565 45673
2    NA    NA 43456 88789
3 43456 43456 55555 11123
4    NA 22222    NA    NA
5 45655 45655 65433 55676
6  6789  6789 22234 76566
7 99999 22222    NA    NA
8 87667 87667 98909    NA

The solution will be much simpler using data.table . 使用data.table解决方案将简单data.table

DF <- data.table(DF)
DF[is.na(Col1) & is.na(Col2) & is.na(Col3), Col1 := 22222]
# Result
> DF
      ID  Col1  Col2  Col3
1: 34564 34564 34565 45673
2:    NA    NA 43456 88789
3: 43456 43456 55555 11123
4:    NA 22222    NA    NA
5: 45655 45655 65433 55676
6:  6789  6789 22234 76566
7: 99999 22222    NA    NA
8: 87667 87667 98909    NA

Your question has a few errors, so my answer will attempt to fill in the blanks. 您的问题有一些错误,因此我的答案将尝试填补空白。 The data frame you have provided does not contain id , for example. 例如,您提供的数据框不包含id I have modified your sample to make this reproducible. 我已修改您的样本以使其可重现。

library(dplyr)

df <- tibble(
  id = c(34564, NA, 43456, NA, 45655, 6789, 99999, 87667),
  col1 = c(45673, 88789, 11123, NA, 55676, 76566, NA, NA),
  col2 = c(34564, NA, 43456, NA, 45655, 6789, NA, 87667),
  col3 = c(34565, 43456, 55555, NA, 65433, 22234, NA, 98909)
)

To solve a single column, you can just use if/else in a normal mutate. 要解决一列问题,您只需在常规突变中使用if / else即可。

df %>%
  mutate(col1 = if_else(
    is.na(col1) & is.na(col2) & is.na(col3), 22222, col1
  ))

# # A tibble: 8 x 4
#      id  col1  col2  col3
#   <dbl> <dbl> <dbl> <dbl>
# 1 34564 45673 34564 34565
# 2    NA 88789    NA 43456
# 3 43456 11123 43456 55555
# 4    NA 22222    NA    NA
# 5 45655 55676 45655 65433
# 6  6789 76566  6789 22234
# 7 99999 22222    NA    NA
# 8 87667    NA 87667 98909

Your question suggests you actually want each column mutated, not only col1 . 您的问题建议您实际上希望每个列都发生突变,而不仅仅是col1 You can substitute the funs(replace()) approach you started with to just apply the earlier if/else to each column listed in vars . 您可以替代开始时使用的funs(replace())方法,仅将较早的if / else应用到vars列出的每个列。

df %>%
  mutate_at(
    vars(col1, col2, col3),
    ~if_else(is.na(df$col1) & is.na(df$col2) & is.na(df$col3), 22222, .)
  )

# # A tibble: 8 x 4
#      id  col1  col2  col3
#   <dbl> <dbl> <dbl> <dbl>
# 1 34564 45673 34564 34565
# 2    NA 88789    NA 43456
# 3 43456 11123 43456 55555
# 4    NA 22222 22222 22222
# 5 45655 55676 45655 65433
# 6  6789 76566  6789 22234
# 7 99999 22222 22222 22222
# 8 87667    NA 87667 98909

This solution works for any number of column. 此解决方案适用于任意数量的列。 It will replace the value with 22222 for each rows that is all NA value in each column 对于每一行,它将用22222替换该值,这是每一列中的所有NA

library(dplyr, warn.conflicts = FALSE)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
DF<-data_frame(Col1,Col2,Col3)

# Find the rows with all NA. Works with any number of column
all_na <- DF %>%
  is.na() %>%
  apply(1, all)

# Replace the value from this rows with 2222 and keep others
DF %>%
  mutate_all(funs(if_else(all_na, 22222, .)))
#> # A tibble: 8 x 3
#>    Col1  Col2  Col3
#>   <dbl> <dbl> <dbl>
#> 1 34564 34565 45673
#> 2    NA 43456 88789
#> 3 43456 55555 11123
#> 4 22222 22222 22222
#> 5 45655 65433 55676
#> 6  6789 22234 76566
#> 7 99999    NA    NA
#> 8 87667 98909    NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM