[英]Replace NA with values in another row of same column for each group in r
I need to replace the NA's of each row with non NA's values of different row for a given column for each group 对于每个组的给定列,我需要用不同行的非NA值替换每行的NA
let say sample data like: 让我们说样本数据如:
id name
1 a
1 NA
2 b
3 NA
3 c
3 NA
desired output: 期望的输出:
id name
1 a
1 a
2 b
3 c
3 c
3 c
Is there a way to perform this in r ? 有没有办法在r中执行此操作?
We can use data.table
to do this. 我们可以使用
data.table
来做到这一点。 Convert the 'data.frame' to 'data.table' ( setDT(df1)
). 将'data.frame'转换为'data.table'(
setDT(df1)
)。 Grouped by 'id', we replace the 'name' with the non-NA value in 'name'. 按'id'分组,我们将'name'替换为'name'中的非NA值。
library(data.table)#v1.9.5+
setDT(df1)[, name:= name[!is.na(name)][1L] , by = id]
df1
# id name
#1: 1 a
#2: 1 a
#3: 2 b
#4: 3 c
#5: 3 c
#6: 3 c
NOTE: Here I assumed that there is only a single unique non-NA value within each 'id' group. 注意:这里我假设每个'id'组中只有一个唯一的非NA值。
Or another option would be to join the dataset with the unique
rows of the data after we order
by 'id' and 'name'. 或者另一种选择是在我们按'id'和'name'
order
之后将数据集与数据的unique
行连接起来。
setDT(df1)
df1[unique(df1[order(id, name)], by='id'), on='id', name:= i.name][]
# id name
#1: 1 a
#2: 1 a
#3: 2 b
#4: 3 c
#5: 3 c
#6: 3 c
NOTE: The on
is only available with the devel version of data.table
. 注意:
on
仅适用于data.table
的devel版本。 Instructions to install the devel version are here
安装devel版本的说明在
here
df1 <- structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L), name = c("a",
NA, "b", NA, "c", NA)), .Names = c("id", "name"),
class = "data.frame", row.names = c(NA, -6L))
Here is an approach using dplyr
. 这是一种使用
dplyr
的方法。 From the data frame x
we group by id
and replace NA
with the relevant values. 从数据框
x
我们按id
分组,并用相关值替换NA
。 I am assuming one unique value of name
per id
. 我假设每个
id
的name
一个唯一值。
x <- data.frame(id = c(1, 1, 2, rep(3,3)),
name = c("a", NA, "b", NA, "c", NA), stringsAsFactors=F)
require(dplyr)
x %>%
group_by(id) %>%
mutate(name = unique(name[!is.na(name)]))
Source: local data frame [6 x 2]
Groups: id
# id name
#1 1 a
#2 1 a
#3 2 b
#4 3 c
#5 3 c
#6 3 c
Base R 基地R.
d<-na.omit(df)
transform(df,name=d$name[match(id,d$id)])
again assuming one unique value of name per id (forces it) 再次假设每个id的名称有一个唯一值(强制它)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.