简体   繁体   English

如何在数据框中组合两个观察结果并用相互矛盾的条目填充 NA

[英]How to combine two observations in a data frame and fill NAs with contradicting entries

I'd like to combine like observations such that NA s in observation A are filled with entries in observation B. If observation A and observation B have contradicting entries, eg, two different values in the same field, I'd like the resulting data frame to return an NA in that field.我想结合类似的观察,使得观察 A 中的NA充满观察 B 中的条目。如果观察 A 和观察 B 有矛盾的条目,例如,同一字段中有两个不同的值,我想要结果数据帧以在该字段中返回NA

Example.例子。

Consider the following data frame考虑以下数据框

df1 <- data.frame(APPLIANT = c("tom", "tom"), 
                  PERMIT = c(31, 31), 
                  ISSUED_YR = c("2018", NA), 
                  TRANSFERED = c("Y", "N"))

It looks like看起来像

  APPLIANT PERMIT ISSUED_YR TRANSFERED
1      tom     31      2018          Y
2      tom     31      <NA>          N

I'd like my final data frame to look like我希望我的最终数据框看起来像

  APPLIANT PERMIT ISSUED_YR TRANSFERED
1      tom     31      2018         NA

I was thinking of using an apply function.我正在考虑使用应用功能。 maybe something like也许像

apply(df1, 2, FUN = function(one_col){
if(length(unique(one_col)) == 1){one_col}else{ one_col[!is.na(one_col)]}
})

But im not sure how to handle the 'contradicting' data points in an elegant way.... I also do not feel like my solution is that elegant to begin with.但我不确定如何以优雅的方式处理“矛盾”的数据点......我也不觉得我的解决方案一开始就那么优雅。 If there is something simpler that would be ideal!如果有更简单的东西,那将是理想的!

Maybe this could help if there is only two observations involved:如果只涉及两个观察,这可能会有所帮助:

library(dplyr)

df1 %>%
  mutate(across(everything(), ~ case_when(
    length(unique(.x)) > 1 & !any(is.na(.x)) ~ NA_character_,
    TRUE ~ as.character(coalesce(.x, .x[!is.na(.x)]))
  ))) %>%
  distinct()

  APPLIANT PERMIT ISSUED_YR TRANSFERED
1      tom     31      2018       <NA>

If there are more than 1 unique value in a column return NA else return the non-NA value.如果列中有 1 个以上的唯一值,则返回NA否则返回非 NA 值。

library(dplyr)

df1 %>%
  group_by(APPLIANT) %>%
  summarise(across(.fns = ~if(n_distinct(., na.rm = TRUE) > 1) NA else na.omit(.)[1]))

#APPLIANT PERMIT ISSUED_YR TRANSFERED
#  <chr>     <dbl> <chr>     <lgl>     
#1 tom          31 2018      NA        

for some reason the above suggestions worked on my example data but not on the actual data.出于某种原因,上述建议适用于我的示例数据,但不适用于实际数据。 In my real data set, I have some columns that are date objects, maybe this posed a problem在我的真实数据集中,我有一些是日期对象的列,也许这造成了问题

What seemed to work for me but is not as 'pretty' was the following似乎对我有用但不那么“漂亮”的是以下内容

df %>%
      mutate_all(funs(if(length(unique(.)) == 1){ 
        unique(.)
      }else{
        if(any(is.na(.))){
          (.)[!is.na(.)]
        }else{
          NA
          }
        })) %>% 
      distinct()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何按数据框中的因子填充 NA,按 ID - How to fill NAs by factors in data frame, by ID 如何用数据框中的因子用LOCF填充NA,按国家/地区划分 - How to fill NAs with LOCF by factors in data frame, split by country 用顺序编号和NA填充data.frame - Fill data.frame with numbers in sequence and NAs 如何查找时间序列中的缺失观测值并填充NA - How to FIND missing observations within a time series and fill with NAs 如何用另一个数据框填充一个数据框,同时保留第一个数据框的 NA - How to fill one data frame with data from another while retaining NAs from the first 如何将两个向量组合成一个数据框 - How to combine two vectors into a data frame 如果给定年份的所有观测值都是 NA,如何删除面板数据中的变量? - How to delete variables in a panel data if all observations for a given year are NAs? 在分组数据帧内,基于另一列中多个观测值的比较,有条件地将值分配给该列中的NA - Conditionally assign values to NAs in a column based on comparison of multiple observations in another column, within a grouped data frame 如何将数据框中的 NaN 转换为 NA - How to turn NaNs in a data frame into NAs 如何在data.frame中找到NA的百分比? - How to find the percentage of NAs in a data.frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM