对于 r 中的每个组，将 NA 替换为同一列另一行中的值 - 值在组内不唯一

Question

I have a question very similar to a previous one but I am unable to generalize it to my case.我有一个与前一个问题非常相似的问题，但我无法将其概括为我的案例。

I have data that looks sort of like this我有看起来像这样的数据

Within each ID, I have several Vis rows.在每个 ID 中，我有几个 Vis 行。 The ones of interest to me are only a and b .我感兴趣的只有a和b 。 The data is such that for each column in the data (V1...V7), if a is present, b is present and for all values of a , b is missing and vice versa.数据是这样的，在数据（V1 ... V7）的每一列中，如果存在时，b的存在和用于所有的a值，b为丢失，并且反之亦然。 I would like to combine Vis's a and b for each ID group such that I have a single row (either a or b or even a new one, it doesn't really matter) without any missing data for any of the columns.我想为每个 ID 组组合 Vis 的a和b ，这样我就有一行（a 或 b 或什至是一个新行，这并不重要）而没有任何列的任何缺失数据。

Answer 1

Based on the image showed, may be this helps.根据显示的图像，这可能有帮助。 Here I am using actual NAs with only a couple of V columns.在这里，我使用只有几个 V 列的实际 NA。

We create a numeric index for column names that start with 'V' followed by numbers ('nm1').我们为以“V”开头、后跟数字 (“nm1”) 的列名创建数字索引。 Convert the 'data.frame' to 'data.table' ( setDT(df1) ), grouped by 'ID', we use Map , loop over the columns specified by the index 'nm1' ( SD[, nm1, with=FALSE] ) and the 'Vis' column, replace the 'V' column elements where the 'Vis' is either 'a' or 'b' by the non-NA element ( na.omit(x[.. ), and assign the output to the numeric index.将 'data.frame' 转换为 'data.table' ( setDT(df1) )，按 'ID' 分组，我们使用Map ，循环遍历由索引 'nm1' 指定的列（ SD[, nm1, with=FALSE] ) 和 'Vis' 列， replace 'Vis' 为 'a' 或 'b' 的 'V' 列元素replace为非 NA 元素 ( na.omit(x[.. )，并分配输出到数字索引。

library(data.table)
nm1 <- grep('V\\d+',colnames(df1)) 

setDT(df1)[, (nm1):= Map(function(x,y) 
    replace(x, which(y %in% c('a', 'b')), na.omit(x[y %in% c('a', 'b')])), 
     .SD[,-1, with=FALSE], list(.SD[[1]])), ID]

We change the 'b' values to 'a'我们将“b”值更改为“a”

 df1[Vis=='b', Vis := 'a']

and get the unique rows并获得unique行

 unique(df1)
 #   ID Vis V1 V2
 #1:  2   a  1  2
 #2:  2   c  4  5
 #3:  3   a  3  4
 #4:  4   a  2  3
 #5:  4   c  3  4
 #6:  4   d  1  1

data数据

df1 <- data.frame(ID= rep(c(2,3,4), c(3,2,4)), Vis=c('a', 'b', 'c', 'a', 
 'b', 'a', 'b', 'c', 'd'), V1= c(1, NA, 4, 3, NA, NA, 2, 3, 1), 
 V2= c(NA, 2, 5, 4, NA, 3, NA, 4, 1), stringsAsFactors=FALSE)

Answer 2

Just sum the values you need while removing NAs.只需在删除 NA 时对您需要的值求和即可。 There are more vectorized ways to do this, but the for loop is a bit clearer.有更多的矢量化方法可以做到这一点，但 for 循环更清晰一些。

for(I in unique(df1$ID)) {
  df_sub <- subset(df1, df1$ID==I & df1$Vis %in% c("a", "b"))
  df1 <- subset(df1, df1$ID != I)
  new_row <- apply(df_sub[, -1:-2], 2, sum, na.rm=TRUE)
  df1 <- rbind(df1, c(ID=I, new_row))
}

对于 r 中的每个组，将 NA 替换为同一列另一行中的值 - 值在组内不唯一

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-10-19 13:36:58

data数据

解决方案2
1 2015-10-19 15:23:09

对于 r 中的每个组，将 NA 替换为同一列另一行中的值 - 值在组内不唯一

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-10-19 13:36:58

data数据

解决方案2 1 2015-10-19 15:23:09

解决方案1
1 已采纳 2015-10-19 13:36:58

解决方案2
1 2015-10-19 15:23:09