简体   繁体   English

填充矩阵的更有效方法

[英]A more efficient way to populate a matrix

I am trying to cbind a very large matrix with a data frame I have and I am running into memory issues due to the size of the matrix. 我试图将一个非常大的矩阵与一个数据帧cbind在一起,由于矩阵的大小,我cbind了内存问题。

I have data: 我有数据:

set.seed(123)
df1 <- data.frame(replicate(5, sample(1:20, 10, rep=TRUE)))
colnames(df1) <- c("col1", "col2", "col3", "col4", "important_col")
df2 <- data.frame(replicate(20, sample(0:0, nrow(df1), rep=TRUE)))
colnames(df2) <- gsub("X", "", colnames(df2))
df_fin <- cbind(df1, df2)

The following Works and does as I want on a small sample but when applied to rows of hundreds of thousands and columns of 1000 + I have the memory issues. 下面的工作和做我想在一个小的示例上,但是当应用于成千上万的行和1000 +的列时,我遇到了内存问题。

vecp <- colnames(df2)

imp_col <- df1$important_col

matrix <-  matrix(vecp, byrow = TRUE,
                           nrow = length(imp_col),
                           ncol = length(vecp),
                           dimnames = list(1:length(imp_col), vecp))

d <- ifelse(matrix == imp_col, 1, 0)


df_fin <- cbind(df1, d)

Where I am trying to make the code more efficient (is where the I have memory issues) at line d <- ifelse(matrix == imp_col, 1, 0) . 我试图在d <- ifelse(matrix == imp_col, 1, 0) )的行提高代码效率的地方(就是我d <- ifelse(matrix == imp_col, 1, 0)内存问题的地方d <- ifelse(matrix == imp_col, 1, 0)

Is there a way I can make the matrix a sparse matrix before I apply the ifesle statement. 在应用ifesle语句之前,有没有办法使矩阵成为sparse矩阵。

I build a matrix like the following: 我建立如下矩阵:

   col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1    11   14    3   11             1 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
2     1    1   19   15             4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
3     3   17   10   10             6 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
4    13   10    8   17            10 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
5    18    5    3   18            19 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
6    11   10    9    5            17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
7     5   11   18   16            17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
8     5    8   13    8             6 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
9    10    1    7   16            12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
10    4   17   17    3             4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

The end product is like: 最终产品如下:

   col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1     6   20   18   20             3 0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
2    16   10   14   19             9 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0
3     9   14   13   14             9 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0
4    18   12   20   16             8 0 0 0 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0
5    19    3   14    1             4 0 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
6     1   18   15   10             3 0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
7    11    5   11   16             5 0 0 0 0 1 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
8    18    1   12    5            10 0 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0
9    12    7    6    7             6 0 0 0 0 0 1 0 0 0  0  0  0  0  0  0  0  0  0  0  0
10   10   20    3    5            18 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0

Which I Will then make into a sparse matrix. 然后我将其变成一个稀疏矩阵。

The problem is that d is the same size as your matrix, so if your matrix is huge then you'll have two of them. 问题在于d与矩阵的大小相同,因此,如果矩阵很大,则将有两个。 One posible option (although probably slower) is to iterate through the columns and change them one at a time, this only creates objects the same size as one column of your matrix. 一种可能的选择(尽管可能更慢)是遍历各列并一次更改它们,这只会创建与矩阵的一列大小相同的对象。 You could give this a try: 您可以尝试一下:

for (i in 1:ncol(matrix)) matrix[, i] <- matrix[, i] == imp_col

The expression returns a boolean but if your matrix is made of integers then they will be converted to 0 and 1. 该表达式返回一个布尔值,但是如果您的矩阵由整数组成,则它们将被转换为0和1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM