[英]Compare one row to all other rows in a file using R
I have a file like below: 我有一个如下文件:
P1 A,B,C
P2 B,C,D,F
P3 C,D,E,F
and I need to compare each row to all other rows to get a count of intersecting elements like below: 我需要将每一行与所有其他行进行比较,以得到相交元素的数量,如下所示:
P1 P2 2
P1 P3 1
P2 P3 3
Thank you, 谢谢,
S 小号
It's unclear where the original data is coming from, so I assumed that you read the data into a data.frame as below: 目前尚不清楚原始数据来自何处,因此我假设您将数据读入data.frame中,如下所示:
x <- data.frame(V1 = c("a", "b", "c"),
V2 = c("b", "c", "d"),
V3 = c("c", "d", "e"),
V4 = c(NA, "f", "f"),
stringsAsFactors = FALSE
)
row.names(x) <- c("p1", "p2", "p3")
The first step is to create the combination of all rows that you need to compare: 第一步是创建需要比较的所有行的组合:
rowIndices <- t(combn(nrow(x), 2))
> rowIndices
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 2 3
Then we can use that information in apply
with length()
and intersect()
to get what you want. 然后,在我们可以利用这些信息
apply
与length()
和intersect()
得到你想要的东西。 Note I also indexed into the row.names()
attribute of the data.frame x
to get the row names like you wanted. 注意,我还索引了data.frame
x
的row.names()
属性,以获取所需的行名。
data.frame(row1 = row.names(x)[rowIndices[, 1]],
row2 = row.names(x)[rowIndices[, 2]],
overlap = apply(rowIndices, 1, function(y) length(intersect(x[y[1] ,], x[y[2] ,])))
)
Gives you something like: 给您类似的东西:
row1 row2 overlap
1 p1 p2 2
2 p1 p3 1
3 p2 p3 3
Read example data. 读取示例数据。
txt <- "P1 A,B,C
P2 B,C,D,F
P3 C,D,E,F"
tc <- textConnection(txt)
dat <- read.table(tc,as.is=TRUE)
close(tc)
Transform to long format and use self join with aggregating function. 转换为长格式,并使用具有聚合功能的自连接。
dat_split <- strsplit(dat$V2,",")
dat_long <- do.call(rbind,lapply(seq_along(dat_split),
function(x) data.frame(id=x,x=dat_split[[x]], stringsAsFactors=FALSE)))
result <- sqldf("SELECT t1.id AS id1,t2.id AS id2,count(t1.x) AS N
FROM dat_long AS t1 INNER JOIN dat_long AS t2
WHERE (t2.id>t1.id) AND (t1.x=t2.x) GROUP BY t1.id,t2.id")
Results 结果
> result
id1 id2 N
1 1 2 2
2 1 3 1
3 2 3 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.