[英]Combine rows with same id and delete duplicated rows
After merging some data, I have multiple rows per ID. 合并一些数据后,每个ID有多行。 I ONLY want to keep multiple SAME ID's if the data differs.
如果数据不同,我只想保留多个SAME ID。 An
NA
value should be considered equal to any colwise data point. NA
值应视为等于任何逐个数据点。
df <- structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L),
v1 = structure(c(1L, 1L, NA, 1L, 1L, 1L, 1L, NA, 1L, 1L), .Label = "a", class = "factor"),
v2 = structure(c(1L, 2L, 2L, 3L, 1L, 1L, 1L, 1L, NA, 1L), .Label = c("a",
"b", "c"), class = "factor"), v3 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, NA, 2L, 2L, 1L), .Label = c("a", "b"), class = "factor")), .Names = c("id",
"v1", "v2", "v3"), row.names = c(NA, -10L), class = "data.frame")
id v1 v2 v3
1 a a a
2 a b a
2 <NA> b a
2 a c a
3 a a a
3 a a a
4 a a <NA>
4 <NA> a b
4 a <NA> b
5 a a a
id v1 v2 v3
1 a a a
2 a b a
2 a c a
3 a a a
4 a a b
5 a a a
Happy if there exists a data.table
solution. 如果存在一个
data.table
解决方案,那就很data.table
。
A possible solution using the data.table
-package: 使用
data.table
可能解决方案:
library(data.table)
setDT(df)[, lapply(.SD, function(x) unique(na.omit(x))), by = id]
which gives: 这使:
id v1 v2 v3 1: 1 aaa 2: 2 aba 3: 2 aca 4: 3 aaa 5: 4 aab 6: 5 aaa
First replace all NA with a respective column value , then find unique values 首先将所有NA替换为相应的列值,然后查找唯一值
library(data.table)
dt<-as.data.table(df)
for (j in seq_len(ncol(dt)))
set(dt,which(is.na(dt[[j]])),j,dt[[j]][1]) #please feel to change dt[[j]][1] to na.omit(dt[[j]])[1] . It is a tradeoff between performance and perfection
unique(dt)
id v1 v2 v3
1: 1 a a a
2: 2 a b a
3: 2 a c a
4: 3 a a a
5: 4 a a a
6: 4 a a b
7: 5 a a a
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.