[英]identify repeated subjects in r
我有以下数据:
subject <- c("A-B10", "A101", "A-B10", "C101", "A101", "C01", "A101", "AB101", "A.B10")
idn <- c(101, 102, 104, 100, 98, 102, 90, 102, 78)
sn <- 1:9
mydata <- data.frame (sn, subject, idn)
sn subject idn
1 1 A-B10 101
2 2 A101 102
3 3 A-B10 104
4 4 C101 100
5 5 A101 98
6 6 C01 102
7 7 A101 90
8 8 AB101 102
9 9 A.B10 78
我想在大型数据集中识别重复的主题。 预期结果如下:
repeat [1]
sn subject idn
1 1 A-B10 101
3 3 A-B10 104
repeat [2]
sn subject idn
2 2 A101 102
5 5 A101 98
7 7 A101 90
编辑:
dup <- mydata$subject[duplicated(mydata$subject)]
mydata[mydata$subject %in% dup, ]
sn subject idn
1 1 A-B10 101
2 2 A101 102
3 3 A-B10 104
5 5 A101 98
7 7 A101 90
lapply(dup, function(x) mydata[mydata$subject == x,])
[[1]]
sn subject idn
1 1 A-B10 101
3 3 A-B10 104
[[2]]
sn subject idn
2 2 A101 102
5 5 A101 98
7 7 A101 90
[[3]]
sn subject idn
2 2 A101 102
5 5 A101 98
7 7 A101 90
例如 :
> ## dup <- mydata$subject[duplicated(mydata$subject)]
> dup <- unique(mydata$subject[duplicated(mydata$subject)]) ## sorry, edited
> mydata[mydata$subject %in% dup, ]
sn subject idn
1 1 A-B10 101
2 2 A101 102
3 3 A-B10 104
5 5 A101 98
> lapply(dup, function(x) mydata[mydata$subject == x,])
[[1]]
sn subject idn
1 1 A-B10 101
3 3 A-B10 104
[[2]]
sn subject idn
2 2 A101 102
5 5 A101 98
这是一种不同的方法。 首先按主题拆分所有数据,然后仅保留具有多个条目的数据。
sets <- split(mydata, mydata$subject)
Filter(function(x) {nrow(x)>1}, sets)
如果您不需要中间体,则可以内联。
Filter(function(x) {nrow(x)>1}, split(mydata, mydata$subject))
这使
> Filter(function(x) {nrow(x)>1}, split(mydata, mydata$subject))
$`A-B10`
sn subject idn
1 1 A-B10 101
3 3 A-B10 104
$A101
sn subject idn
2 2 A101 102
5 5 A101 98
7 7 A101 90
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.