如何在数据框中组合两个观察结果并用相互矛盾的条目填充 NA

Question

I'd like to combine like observations such that NA s in observation A are filled with entries in observation B. If observation A and observation B have contradicting entries, eg, two different values in the same field, I'd like the resulting data frame to return an NA in that field.我想结合类似的观察，使得观察 A 中的NA充满观察 B 中的条目。如果观察 A 和观察 B 有矛盾的条目，例如，同一字段中有两个不同的值，我想要结果数据帧以在该字段中返回NA 。

Example.例子。

Consider the following data frame考虑以下数据框

df1 <- data.frame(APPLIANT = c("tom", "tom"), 
                  PERMIT = c(31, 31), 
                  ISSUED_YR = c("2018", NA), 
                  TRANSFERED = c("Y", "N"))

It looks like看起来像

  APPLIANT PERMIT ISSUED_YR TRANSFERED
1      tom     31      2018          Y
2      tom     31      <NA>          N

I'd like my final data frame to look like我希望我的最终数据框看起来像

  APPLIANT PERMIT ISSUED_YR TRANSFERED
1      tom     31      2018         NA

I was thinking of using an apply function.我正在考虑使用应用功能。 maybe something like也许像

apply(df1, 2, FUN = function(one_col){
if(length(unique(one_col)) == 1){one_col}else{ one_col[!is.na(one_col)]}
})

But im not sure how to handle the 'contradicting' data points in an elegant way.... I also do not feel like my solution is that elegant to begin with.但我不确定如何以优雅的方式处理“矛盾”的数据点......我也不觉得我的解决方案一开始就那么优雅。 If there is something simpler that would be ideal!如果有更简单的东西，那将是理想的！

Answer 1

Maybe this could help if there is only two observations involved:如果只涉及两个观察，这可能会有所帮助：

library(dplyr)

df1 %>%
  mutate(across(everything(), ~ case_when(
    length(unique(.x)) > 1 & !any(is.na(.x)) ~ NA_character_,
    TRUE ~ as.character(coalesce(.x, .x[!is.na(.x)]))
  ))) %>%
  distinct()

  APPLIANT PERMIT ISSUED_YR TRANSFERED
1      tom     31      2018       <NA>

Answer 2

If there are more than 1 unique value in a column return NA else return the non-NA value.如果列中有 1 个以上的唯一值，则返回NA否则返回非 NA 值。

library(dplyr)

df1 %>%
  group_by(APPLIANT) %>%
  summarise(across(.fns = ~if(n_distinct(., na.rm = TRUE) > 1) NA else na.omit(.)[1]))

#APPLIANT PERMIT ISSUED_YR TRANSFERED
#  <chr>     <dbl> <chr>     <lgl>     
#1 tom          31 2018      NA

Answer 3

for some reason the above suggestions worked on my example data but not on the actual data.出于某种原因，上述建议适用于我的示例数据，但不适用于实际数据。 In my real data set, I have some columns that are date objects, maybe this posed a problem在我的真实数据集中，我有一些是日期对象的列，也许这造成了问题

What seemed to work for me but is not as 'pretty' was the following似乎对我有用但不那么“漂亮”的是以下内容

df %>%
      mutate_all(funs(if(length(unique(.)) == 1){ 
        unique(.)
      }else{
        if(any(is.na(.))){
          (.)[!is.na(.)]
        }else{
          NA
          }
        })) %>% 
      distinct()

如何在数据框中组合两个观察结果并用相互矛盾的条目填充 NA

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-07-29 23:50:04

解决方案2
0 2021-07-30 02:26:07

解决方案3
0 2021-07-30 03:55:48

如何在数据框中组合两个观察结果并用相互矛盾的条目填充 NA

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-07-29 23:50:04

解决方案2 0 2021-07-30 02:26:07

解决方案3 0 2021-07-30 03:55:48

解决方案1
1 已采纳 2021-07-29 23:50:04

解决方案2
0 2021-07-30 02:26:07

解决方案3
0 2021-07-30 03:55:48