[英]Convert a string into a similarity matrix
我有一些特殊格式的字符串,代表集合。 在R中,我想將它們轉換為相似度矩陣。
例如,一個字符串表示1 + 2組成一個集合,3單獨表示一個集合,而4,5和6組成一個集合是:
"1+2,3,4+5+6"
對於上面的示例,我希望能夠產生
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 0 0 0 0 [2,] 1 1 0 0 0 0 [3,] 0 0 1 0 0 0 [4,] 0 0 0 1 1 1 [5,] 0 0 0 1 1 1 [6,] 0 0 0 1 1 1
看來這應該是一件痛苦的簡單任務。 我將如何處理?
這是一種方法:
out <- lapply(unlist(strsplit("1+2,3,4+5+6", ",")), function(x) {
as.numeric(unlist(strsplit(x, "\\+")))
})
x <- table(unlist(out), rep(seq_along(out), sapply(out, length)))
matrix(x %*% t(x), nrow(x))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 0 0 0 0
## [2,] 1 1 0 0 0 0
## [3,] 0 0 1 0 0 0
## [4,] 0 0 0 1 1 1
## [5,] 0 0 0 1 1 1
## [6,] 0 0 0 1 1 1
偽代碼:
Split at , to get an array of strings, each describing a set.
For each element of the array:
Split at + to get an array of set members
Mark every possible pairing of members of this set on the matrix
您可以使用以下方法在R中創建矩陣:
m = mat.or.vec(6, 6)
默認情況下,矩陣應使用所有條目0進行初始化。您可以使用以下方法分配新值:
m[2,3] = 1
這是另一種方法:
# write a simple function
similarity <- function(string){
sets <- gsub("\\+", ":", strsplit(string, ",")[[1]])
n <- as.numeric(tail(strsplit(gsub("[[:punct:]]", "", string), "")[[1]], 1))
mat <- mat.or.vec(n, n)
ind <- suppressWarnings(lapply(sets, function(x) eval(parse(text=x))))
for(i in 1:length(ind)){
mat[ind[[i]], ind[[i]]] <- 1
}
return(mat)
}
# Use that function
> similarity("1+2,3,4+5+6")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 0 0 0 0
[2,] 1 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 1 1
[5,] 0 0 0 1 1 1
[6,] 0 0 0 1 1 1
# Using other string
> similarity("1+2,3,5+6+7, 8")
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 0 0 0 0 0 0
[2,] 1 1 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 1 1 1 0
[6,] 0 0 0 0 1 1 1 0
[7,] 0 0 0 0 1 1 1 0
[8,] 0 0 0 0 0 0 0 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.