![](/img/trans.png)
[英]Memory efficient way to zero out the diagonal of a sparse matrix in R
[英]Efficient way to convert CSV to Sparse Matrix in R
我有一個很大的csv文件(大約9100萬行,所以for循環在R中花費的時間太長)關鍵字之間的相似性,當我讀入data.frame時,它看起來像:
> df
kwd1 kwd2 similarity
a b 1
b a 1
c a 2
a c 2
這是一個稀疏列表,我想將其轉換為稀疏矩陣:
> myMatrix
a b c
a . 1 2
b 1 . .
c 2 . .
我嘗試使用sparseMatrix(),但是將關鍵字名稱轉換為整數索引會花費太多時間。
謝謝你的幫助!
acast
從reshape2
包將很好地做到這一點。 有基本的R解決方案,但我發現語法要困難得多。
library(reshape2)
df <- structure(list(kwd1 = structure(c(1L, 2L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), kwd2 = structure(c(2L, 1L, 1L,
3L), .Label = c("a", "b", "c"), class = "factor"), similarity = c(1L,
1L, 2L, 2L)), .Names = c("kwd1", "kwd2", "similarity"), class = "data.frame", row.names = c(NA,
-4L))
acast(df, kwd1 ~ kwd2, value.var='similarity', fill=0)
a b c
a 0 1 2
b 1 0 0
c 2 0 0
>
使用Matrix
包中的sparseMatrix
:
library(Matrix)
df$kwd1 <- factor(df$kwd1)
df$kwd2 <- factor(df$kwd2)
foo <- sparseMatrix(as.integer(df$kwd1), as.integer(df$kwd2), x=df$similarity)
> foo
3 x 3 sparse Matrix of class "dgCMatrix"
foo <- sparseMatrix(as.integer(df$kwd1), as.integer(df$kwd2), x=df$similarity, dimnames=list(levels(df$kwd1), levels(df$kwd2)))
> foo
3 x 3 sparse Matrix of class "dgCMatrix"
a b c
a . 1 2
b 1 . .
c 2 . .
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.