[英]How to choose columns with only NAs and a unique value and fill NA's with that value?
I have a data frame some columns of which have only a unique value or NA. 我有一个数据框,其中某些列只有唯一值或NA。 I want to choose these columns and fill the NA's in these columns with the unique non-missing variable in the column.
我想选择这些列,并用该列中唯一的非缺失变量填充这些列中的NA。
Here is a mock-data: 这是一个模拟数据:
df = data.frame( A = c(1,NA,1,1,NA), B = c(2,NA,5,2,5), C =c(3,3,NA,NA,NA))
#df
# A B C
#1 1 2 3
#2 NA NA 3
#3 1 5 NA
#4 1 2 NA
#5 NA 5 NA
I want to obtain: 我想获得:
#df
# A B C
#1 1 2 3
#2 1 NA 3
#3 1 5 3
#4 1 2 3
#5 1 5 3
So far, I tried: 到目前为止,我尝试了:
df = df %>%
map_if((length(unique(na.omit(.)))== 1), ~ unique(na.omit(.)))
df = df %>%
mutate_if((length(unique(na.omit(.)))== 1), ~ unique(na.omit(.)))
Both gave the following error: 两者都给出以下错误:
Error in probe(.x, .p) : length(.p) == length(.x) is not TRUE
探针(.x,.p)中的错误:长度(.p)==长度(.x)不正确
Can somebody please tell me what is the correct syntax to achieve what I want? 有人可以告诉我实现我想要的正确语法是什么吗?
Perhaps I misunderstood your question, but is this not just a matter of fill
? 也许我误解你的问题,但是这不只是一个物质
fill
?
df %>% fill(A, C)
# A B C
#1 1 2 3
#2 1 4 3
#3 1 5 3
#4 1 2 3
#5 1 5 3
To fill all columns, and to also make sure that columns starting with an NA
are filled, we can fill values in both directions (up and down): 为了填充所有列,并且还要确保填充以
NA
开头的列,我们可以在两个方向(上下)上填充值:
df %>% fill(everything()) %>% fill(everything(), .direction = "down")
Now that I understood your question, we can use mutate_if
with your conditional statement to replace
values 现在,我理解了您的问题,我们可以将
mutate_if
与您的条件语句一起使用以replace
值
df %>%
mutate_if(
function(x) length(unique(na.omit(x))) == 1,
function(x) replace(x, is.na(x), unique(na.omit(x))))
# A B C
#1 1 2 3
#2 1 NA 3
#3 1 5 3
#4 1 2 3
#5 1 5 3
We could check for condition in mutate_if
and if it is satsfied then use the first non-NA value for entire column 我们可以在
mutate_if
检查条件,如果满足,则对整个列使用第一个非NA值
library(tidyverse)
df %>%
mutate_if(~n_distinct(.[!is.na(.)]) == 1, funs(.[!is.na(.)][1]))
# A B C
#1 1 2 3
#2 1 NA 3
#3 1 5 3
#4 1 2 3
#5 1 5 3
which could also be written as suggested by @RHertel 也可以按照@RHertel的建议编写
df %>% mutate_if(~n_distinct(.[na.omit(.)]) == 1, funs(na.omit(.)[1]))
To make it more clear we could create functions and use them accordingly 为了更加清楚,我们可以创建函数并相应地使用它们
only_one_unique <- function(x) {
n_distinct(x[!is.na(x)]) == 1
}
first_non_NA_value <- function(x) {
x[!is.na(x)][1]
}
df %>% mutate_if(only_one_unique, first_non_NA_value)
We could keep everything in base R using the same logic 我们可以使用相同的逻辑将所有内容保留在基数R中
only_one_unique <- function(x) {
length(unique(x[!is.na(x)])) == 1
}
first_non_NA_value <- function(x) {
x[!is.na(x)][1]
}
df[] <- lapply(df, function(x) if (only_one_unique(x))
first_non_NA_value(x) else x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.