填充矩陣的更有效方法

Question

我試圖將一個非常大的矩陣與一個數據幀cbind在一起，由於矩陣的大小，我cbind了內存問題。

我有數據：

set.seed(123)
df1 <- data.frame(replicate(5, sample(1:20, 10, rep=TRUE)))
colnames(df1) <- c("col1", "col2", "col3", "col4", "important_col")
df2 <- data.frame(replicate(20, sample(0:0, nrow(df1), rep=TRUE)))
colnames(df2) <- gsub("X", "", colnames(df2))
df_fin <- cbind(df1, df2)

下面的工作和做我想在一個小的示例上，但是當應用於成千上萬的行和1000 +的列時，我遇到了內存問題。

vecp <- colnames(df2)

imp_col <- df1$important_col

matrix <-  matrix(vecp, byrow = TRUE,
                           nrow = length(imp_col),
                           ncol = length(vecp),
                           dimnames = list(1:length(imp_col), vecp))

d <- ifelse(matrix == imp_col, 1, 0)


df_fin <- cbind(df1, d)

我試圖在d <- ifelse(matrix == imp_col, 1, 0) ）的行提高代碼效率的地方（就是我d <- ifelse(matrix == imp_col, 1, 0)內存問題的地方d <- ifelse(matrix == imp_col, 1, 0) 。

在應用ifesle語句之前，有沒有辦法使矩陣成為sparse矩陣。

我建立如下矩陣：

   col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1    11   14    3   11             1 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
2     1    1   19   15             4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
3     3   17   10   10             6 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
4    13   10    8   17            10 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
5    18    5    3   18            19 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
6    11   10    9    5            17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
7     5   11   18   16            17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
8     5    8   13    8             6 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
9    10    1    7   16            12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
10    4   17   17    3             4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

最終產品如下：

   col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1     6   20   18   20             3 0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
2    16   10   14   19             9 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0
3     9   14   13   14             9 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0
4    18   12   20   16             8 0 0 0 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0
5    19    3   14    1             4 0 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
6     1   18   15   10             3 0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
7    11    5   11   16             5 0 0 0 0 1 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
8    18    1   12    5            10 0 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0
9    12    7    6    7             6 0 0 0 0 0 1 0 0 0  0  0  0  0  0  0  0  0  0  0  0
10   10   20    3    5            18 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0

然后我將其變成一個稀疏矩陣。

Answer 1

問題在於d與矩陣的大小相同，因此，如果矩陣很大，則將有兩個。 一種可能的選擇（盡管可能更慢）是遍歷各列並一次更改它們，這只會創建與矩陣的一列大小相同的對象。 您可以嘗試一下：

for (i in 1:ncol(matrix)) matrix[, i] <- matrix[, i] == imp_col

該表達式返回一個布爾值，但是如果您的矩陣由整數組成，則它們將被轉換為0和1。

填充矩陣的更有效方法

問題描述

1 個解決方案

解決方案1
1 已采納 2019-03-14 23:04:33

填充矩陣的更有效方法

問題描述

1 個解決方案

解決方案1 1 已采納 2019-03-14 23:04:33

解決方案1
1 已采納 2019-03-14 23:04:33