將字符串轉換為相似度矩陣

Question

我有一些特殊格式的字符串，代表集合。 在R中，我想將它們轉換為相似度矩陣。

例如，一個字符串表示1 + 2組成一個集合，3單獨表示一個集合，而4,5和6組成一個集合是：

"1+2,3,4+5+6"

對於上面的示例，我希望能夠產生

  [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 0 0 0 0 [2,] 1 1 0 0 0 0 [3,] 0 0 1 0 0 0 [4,] 0 0 0 1 1 1 [5,] 0 0 0 1 1 1 [6,] 0 0 0 1 1 1

看來這應該是一件痛苦的簡單任務。 我將如何處理？

Answer 1

這是一種方法：

out <- lapply(unlist(strsplit("1+2,3,4+5+6", ",")), function(x) {
    as.numeric(unlist(strsplit(x, "\\+")))
})

x <- table(unlist(out), rep(seq_along(out), sapply(out, length)))

matrix(x %*% t(x), nrow(x))

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    0    0    0    0
## [2,]    1    1    0    0    0    0
## [3,]    0    0    1    0    0    0
## [4,]    0    0    0    1    1    1
## [5,]    0    0    0    1    1    1
## [6,]    0    0    0    1    1    1

Answer 2

偽代碼：

Split at , to get an array of strings, each describing a set.
For each element of the array:
    Split at + to get an array of set members
    Mark every possible pairing of members of this set on the matrix

您可以使用以下方法在R中創建矩陣：

m = mat.or.vec(6, 6)

默認情況下，矩陣應使用所有條目0進行初始化。您可以使用以下方法分配新值：

m[2,3] = 1

Answer 3

這是另一種方法：

# write a simple function
similarity <- function(string){
  sets <- gsub("\\+", ":", strsplit(string, ",")[[1]])
  n <- as.numeric(tail(strsplit(gsub("[[:punct:]]", "", string), "")[[1]], 1))
  mat <- mat.or.vec(n, n)
  ind <- suppressWarnings(lapply(sets, function(x) eval(parse(text=x))))

  for(i in 1:length(ind)){
    mat[ind[[i]], ind[[i]]] <- 1
  } 

  return(mat)

}

# Use that function
> similarity("1+2,3,4+5+6")
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    0    0    0    0
[2,]    1    1    0    0    0    0
[3,]    0    0    1    0    0    0
[4,]    0    0    0    1    1    1
[5,]    0    0    0    1    1    1
[6,]    0    0    0    1    1    1

# Using other string
> similarity("1+2,3,5+6+7, 8")
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    0    0    0    0    0    0
[2,]    1    1    0    0    0    0    0    0
[3,]    0    0    1    0    0    0    0    0
[4,]    0    0    0    0    0    0    0    0
[5,]    0    0    0    0    1    1    1    0
[6,]    0    0    0    0    1    1    1    0
[7,]    0    0    0    0    1    1    1    0
[8,]    0    0    0    0    0    0    0    1

將字符串轉換為相似度矩陣

問題描述

3 個解決方案

解決方案1
5 已采納 2014-01-10 19:29:56

解決方案2
2 2014-01-10 19:16:18

解決方案3
1 2014-01-10 19:36:11

將字符串轉換為相似度矩陣

問題描述

3 個解決方案

解決方案1 5 已采納 2014-01-10 19:29:56

解決方案2 2 2014-01-10 19:16:18

解決方案3 1 2014-01-10 19:36:11

解決方案1
5 已采納 2014-01-10 19:29:56

解決方案2
2 2014-01-10 19:16:18

解決方案3
1 2014-01-10 19:36:11