比较多行并在R或Excel中创建矩阵

Question

I have a file containing, multiple rows as follows 我有一个包含多个行的文件，如下所示

In file1: 在文件1中：

a  8|2|3|4   4
b  2|3|5|6|7 5
c  8|5|6|7|9 5

a to a has 4 overlaps, similarly a to b had 2 overlaps, so to check the overlaps between various entity, I need to generate a matrix with the above details, and the output should be a matrix like a到a有4个重叠，类似a到b也有2个重叠，因此要检查各个实体之间的重叠，我需要生成一个具有上述详细信息的矩阵，并且输出应为类似

Please give me a suggestion, how to do this? 请给我一个建议，该怎么做？ Is there any way to do this using excel or using a shell script or using R? 有什么办法可以使用excel或Shell脚本或R来做到这一点？ I have written this following code but since I am not a good coder, I couldn't get the output printed in a right format. 我已经编写了以下代码，但是由于我不是一个好的编码人员，所以无法以正确的格式打印输出。

setwd('C:\\Users\\Desktop\\')
newmet1<-file("file.txt")
newmet2<-strsplit(readLines(newmet1),"\t")
Newmet<-sapply(newmet2, function(x) x[2:length(x)], simplify=F )

for (i in 1:length(Newmet))
{
  for (j in 1:length(Newmet)
  {
  c <- ((intersect(Newmet[[i]], Newmet[[j]]))
  print (length(c))
  } 
}

Edited: Thanks for all the answers.. I got the matrix using both excel and R with the help of following answers. 编辑：谢谢所有的答案。在以下答案的帮助下，我同时使用excel和R获得了矩阵。

Answer 1

Here is a function in R that returns the counts of each columns matches as a new matrix 这是R中的一个函数，它以新矩阵的形式返回匹配的每一列的计数

First we get your data into a R data.frame object: 首先，我们将您的数据放入R data.frame对象：

A <- c(8,2,3,4,NA)
B <- c(2,3,5,6,7)
C <- c(8,5,6,7,9)
dataset <- data.frame(A,B,C)

Then we create a function: 然后我们创建一个函数：

count_matches <- function (x) {
  if (is.data.frame(x)) {
    y <- NULL
    for (i in 1:dim(x)[2]) {
      for (j in 1:dim(x)[2]) {
        count <- sum(x[[i]][!is.na(x[i])] %in% x[[j]][!is.na(x[j])])
        y <- c(y, count)
      }
    }
    y <- matrix(y, dim(x)[2], )
    colnames(y) <- names(x)
    rownames(y) <- names(x)
    return(y)
  } else {
    print('Argument must be a data.frame')
  }
}

We test the function on our dataset: 我们在数据集上测试该函数：

count_matches(dat)

Which returns a matrix: 它返回一个矩阵：

Answer 2

If the numbers are in separate cells starting in Sheet1!A1, try 如果数字是从Sheet1！A1开始的单独单元格中，请尝试

=SUM(--ISNUMBER(MATCH(Sheet1!$A1:$E1,INDEX(Sheet1!$A$1:$E$3,COLUMN(),0),0)))

starting at Sheet2!A1. 从Sheet2！A1开始。

Must be entered as an array formula using Ctrl Shift Enter 必须使用Ctrl Shift 输入作为数组公式

Alternative formula that doesn't have to start at Sheet2!A1 不必从Sheet2！A1开始的替代公式

SUM(--ISNUMBER(MATCH(Sheet1!$A1:$E1,INDEX(Sheet1!$A$1:$E$3,COLUMNS($A:A),0),0)))

Answer 3

Using R: 使用R：

# dummy data
df1 <- read.table(text = "a  8|2|3|4   4
b  2|3|5|6|7 5
c  8|5|6|7|9 5", as.is = TRUE)

df1
#   V1        V2 V3
# 1  a   8|2|3|4  4
# 2  b 2|3|5|6|7  5
# 3  c 8|5|6|7|9  5

# convert 2nd column to a splitted list
myList <- unlist(lapply(df1$V2, strsplit, split = "|", fixed = TRUE), recursive = FALSE)
names(myList) <- df1$V1
myList
# $a
# [1] "8" "2" "3" "4"
# $b
# [1] "2" "3" "5" "6" "7"
# $c
# [1] "8" "5" "6" "7" "9"

# get overlap counts
crossprod(table(stack(myList)))
#    ind
# ind a b c
#   a 4 2 1
#   b 2 5 3
#   c 1 3 5

If we remove data processing bit, this answer is already provided by similar post: Intersect all possible combinations of list elements 如果我们删除数据处理位，则类似的帖子已经提供了此答案：与列表元素的所有可能组合相交

比较多行并在R或Excel中创建矩阵

问题描述

3 个解决方案

解决方案1
2 2016-09-09 11:51:09

解决方案2
1 已采纳 2016-09-09 09:28:19

解决方案3
1

比较多行并在R或Excel中创建矩阵

问题描述

3 个解决方案

解决方案1 2 2016-09-09 11:51:09

解决方案2 1 已采纳 2016-09-09 09:28:19

解决方案3 1

解决方案1
2 2016-09-09 11:51:09

解决方案2
1 已采纳 2016-09-09 09:28:19

解决方案3
1