简体   繁体   English

将r替换为r中每个组的相同列的另一行中的值

[英]Replace NA with values in another row of same column for each group in r

I need to replace the NA's of each row with non NA's values of different row for a given column for each group 对于每个组的给定列,我需要用不同行的非NA值替换每行的NA

let say sample data like: 让我们说样本数据如:

id   name
 1     a
 1     NA
 2     b
 3     NA
 3     c
 3     NA

desired output: 期望的输出:

id   name
 1     a
 1     a
 2     b
 3     c
 3     c
 3     c

Is there a way to perform this in r ? 有没有办法在r中执行此操作?

We can use data.table to do this. 我们可以使用data.table来做到这一点。 Convert the 'data.frame' to 'data.table' ( setDT(df1) ). 将'data.frame'转换为'data.table'( setDT(df1) )。 Grouped by 'id', we replace the 'name' with the non-NA value in 'name'. 按'id'分组,我们将'name'替换为'name'中的非NA值。

library(data.table)#v1.9.5+
setDT(df1)[, name:= name[!is.na(name)][1L] , by = id]
df1
#   id name
#1:  1    a
#2:  1    a
#3:  2    b
#4:  3    c
#5:  3    c
#6:  3    c

NOTE: Here I assumed that there is only a single unique non-NA value within each 'id' group. 注意:这里我假设每个'id'组中只有一个唯一的非NA值。

Or another option would be to join the dataset with the unique rows of the data after we order by 'id' and 'name'. 或者另一种选择是在我们按'id'和'name' order之后将数据集与数据的unique行连接起来。

 setDT(df1)
 df1[unique(df1[order(id, name)], by='id'), on='id', name:= i.name][]
 #   id name
 #1:  1    a
 #2:  1    a
 #3:  2    b
 #4:  3    c
 #5:  3    c
 #6:  3    c

NOTE: The on is only available with the devel version of data.table . 注意: on仅适用于data.table的devel版本。 Instructions to install the devel version are here 安装devel版本的说明在here

data 数据

df1 <- structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L), name = c("a", 
NA, "b", NA, "c", NA)), .Names = c("id", "name"),
class = "data.frame",    row.names = c(NA, -6L))

Here is an approach using dplyr . 这是一种使用dplyr的方法。 From the data frame x we group by id and replace NA with the relevant values. 从数据框x我们按id分组,并用相关值替换NA I am assuming one unique value of name per id . 我假设每个idname一个唯一值。

x <- data.frame(id = c(1, 1, 2, rep(3,3)), 
 name = c("a", NA, "b", NA, "c", NA), stringsAsFactors=F)

require(dplyr)
x %>%
  group_by(id) %>%
  mutate(name = unique(name[!is.na(name)]))

Source: local data frame [6 x 2]
Groups: id

#  id name
#1  1    a
#2  1    a
#3  2    b
#4  3    c
#5  3    c
#6  3    c

Base R 基地R.

d<-na.omit(df)
transform(df,name=d$name[match(id,d$id)])

again assuming one unique value of name per id (forces it) 再次假设每个id的名称有一个唯一值(强制它)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM