如何选择仅包含NA和唯一值的列，并用该值填充NA？

Question

I have a data frame some columns of which have only a unique value or NA. 我有一个数据框，其中某些列只有唯一值或NA。 I want to choose these columns and fill the NA's in these columns with the unique non-missing variable in the column. 我想选择这些列，并用该列中唯一的非缺失变量填充这些列中的NA。

Here is a mock-data: 这是一个模拟数据：

df = data.frame( A = c(1,NA,1,1,NA), B = c(2,NA,5,2,5), C =c(3,3,NA,NA,NA))
#df
#   A  B   C
#1  1  2   3
#2  NA NA  3
#3  1  5   NA
#4  1  2   NA
#5  NA 5   NA

I want to obtain: 我想获得：

#df
#   A B   C
#1  1 2   3
#2  1 NA  3
#3  1 5   3
#4  1 2   3
#5  1 5   3

So far, I tried: 到目前为止，我尝试了：

df = df %>% 
      map_if((length(unique(na.omit(.)))== 1), ~ unique(na.omit(.)))

df = df %>% 
     mutate_if((length(unique(na.omit(.)))== 1), ~ unique(na.omit(.)))

Both gave the following error: 两者都给出以下错误：

Error in probe(.x, .p) : length(.p) == length(.x) is not TRUE 探针（.x，.p）中的错误：长度（.p）==长度（.x）不正确

Can somebody please tell me what is the correct syntax to achieve what I want? 有人可以告诉我实现我想要的正确语法是什么吗？

Answer 1

Perhaps I misunderstood your question, but is this not just a matter of fill ? 也许我误解你的问题，但是这不只是一个物质fill ？

df %>% fill(A, C)
#  A B C
#1 1 2 3
#2 1 4 3
#3 1 5 3
#4 1 2 3
#5 1 5 3

To fill all columns, and to also make sure that columns starting with an NA are filled, we can fill values in both directions (up and down): 为了填充所有列，并且还要确保填充以NA开头的列，我们可以在两个方向（上下）上填充值：

df %>% fill(everything()) %>% fill(everything(), .direction = "down")

Update 更新

Now that I understood your question, we can use mutate_if with your conditional statement to replace values 现在，我理解了您的问题，我们可以将mutate_if与您的条件语句一起使用以replace值

df %>%
    mutate_if(
        function(x) length(unique(na.omit(x))) == 1,
        function(x) replace(x, is.na(x), unique(na.omit(x))))
#  A  B C
#1 1  2 3
#2 1 NA 3
#3 1  5 3
#4 1  2 3
#5 1  5 3

Answer 2

We could check for condition in mutate_if and if it is satsfied then use the first non-NA value for entire column 我们可以在mutate_if检查条件，如果满足，则对整个列使用第一个非NA值

library(tidyverse)

df %>%
  mutate_if(~n_distinct(.[!is.na(.)]) == 1, funs(.[!is.na(.)][1]))


#  A  B C
#1 1  2 3
#2 1 NA 3
#3 1  5 3
#4 1  2 3
#5 1  5 3

which could also be written as suggested by @RHertel 也可以按照@RHertel的建议编写

df %>% mutate_if(~n_distinct(.[na.omit(.)]) == 1, funs(na.omit(.)[1]))

To make it more clear we could create functions and use them accordingly 为了更加清楚，我们可以创建函数并相应地使用它们

only_one_unique <- function(x) {
   n_distinct(x[!is.na(x)]) == 1
}

first_non_NA_value <- function(x) {
   x[!is.na(x)][1]
}

df %>%  mutate_if(only_one_unique, first_non_NA_value)

We could keep everything in base R using the same logic 我们可以使用相同的逻辑将所有内容保留在基数R中

only_one_unique <- function(x) {
   length(unique(x[!is.na(x)])) == 1
}

first_non_NA_value <- function(x) {
   x[!is.na(x)][1]
}

df[] <- lapply(df, function(x) if (only_one_unique(x)) 
                                   first_non_NA_value(x) else x)

如何选择仅包含NA和唯一值的列，并用该值填充NA？

问题描述

2 个解决方案

解决方案1
2 2019-01-20 11:22:57

Update 更新

解决方案2
2 已采纳 2019-01-20 12:35:46

如何选择仅包含NA和唯一值的列，并用该值填充NA？

问题描述

2 个解决方案

解决方案1 2 2019-01-20 11:22:57

Update 更新

解决方案2 2 已采纳 2019-01-20 12:35:46

解决方案1
2 2019-01-20 11:22:57

解决方案2
2 已采纳 2019-01-20 12:35:46