[英]How to combine two observations in a data frame and fill NAs with contradicting entries
I'd like to combine like observations such that NA
s in observation A are filled with entries in observation B. If observation A and observation B have contradicting entries, eg, two different values in the same field, I'd like the resulting data frame to return an NA
in that field.我想结合类似的观察,使得观察 A 中的
NA
充满观察 B 中的条目。如果观察 A 和观察 B 有矛盾的条目,例如,同一字段中有两个不同的值,我想要结果数据帧以在该字段中返回NA
。
Example.例子。
Consider the following data frame考虑以下数据框
df1 <- data.frame(APPLIANT = c("tom", "tom"),
PERMIT = c(31, 31),
ISSUED_YR = c("2018", NA),
TRANSFERED = c("Y", "N"))
It looks like看起来像
APPLIANT PERMIT ISSUED_YR TRANSFERED
1 tom 31 2018 Y
2 tom 31 <NA> N
I'd like my final data frame to look like我希望我的最终数据框看起来像
APPLIANT PERMIT ISSUED_YR TRANSFERED
1 tom 31 2018 NA
I was thinking of using an apply function.我正在考虑使用应用功能。 maybe something like
也许像
apply(df1, 2, FUN = function(one_col){
if(length(unique(one_col)) == 1){one_col}else{ one_col[!is.na(one_col)]}
})
But im not sure how to handle the 'contradicting' data points in an elegant way.... I also do not feel like my solution is that elegant to begin with.但我不确定如何以优雅的方式处理“矛盾”的数据点......我也不觉得我的解决方案一开始就那么优雅。 If there is something simpler that would be ideal!
如果有更简单的东西,那将是理想的!
Maybe this could help if there is only two observations involved:如果只涉及两个观察,这可能会有所帮助:
library(dplyr)
df1 %>%
mutate(across(everything(), ~ case_when(
length(unique(.x)) > 1 & !any(is.na(.x)) ~ NA_character_,
TRUE ~ as.character(coalesce(.x, .x[!is.na(.x)]))
))) %>%
distinct()
APPLIANT PERMIT ISSUED_YR TRANSFERED
1 tom 31 2018 <NA>
If there are more than 1 unique value in a column return NA
else return the non-NA value.如果列中有 1 个以上的唯一值,则返回
NA
否则返回非 NA 值。
library(dplyr)
df1 %>%
group_by(APPLIANT) %>%
summarise(across(.fns = ~if(n_distinct(., na.rm = TRUE) > 1) NA else na.omit(.)[1]))
#APPLIANT PERMIT ISSUED_YR TRANSFERED
# <chr> <dbl> <chr> <lgl>
#1 tom 31 2018 NA
for some reason the above suggestions worked on my example data but not on the actual data.出于某种原因,上述建议适用于我的示例数据,但不适用于实际数据。 In my real data set, I have some columns that are date objects, maybe this posed a problem
在我的真实数据集中,我有一些是日期对象的列,也许这造成了问题
What seemed to work for me but is not as 'pretty' was the following似乎对我有用但不那么“漂亮”的是以下内容
df %>%
mutate_all(funs(if(length(unique(.)) == 1){
unique(.)
}else{
if(any(is.na(.))){
(.)[!is.na(.)]
}else{
NA
}
})) %>%
distinct()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.