[英]A more efficient way to populate a matrix
I am trying to cbind
a very large matrix with a data frame I have and I am running into memory issues due to the size of the matrix. 我试图将一个非常大的矩阵与一个数据帧
cbind
在一起,由于矩阵的大小,我cbind
了内存问题。
I have data: 我有数据:
set.seed(123)
df1 <- data.frame(replicate(5, sample(1:20, 10, rep=TRUE)))
colnames(df1) <- c("col1", "col2", "col3", "col4", "important_col")
df2 <- data.frame(replicate(20, sample(0:0, nrow(df1), rep=TRUE)))
colnames(df2) <- gsub("X", "", colnames(df2))
df_fin <- cbind(df1, df2)
The following Works and does as I want on a small sample but when applied to rows of hundreds of thousands and columns of 1000 + I have the memory issues. 下面的工作和做我想在一个小的示例上,但是当应用于成千上万的行和1000 +的列时,我遇到了内存问题。
vecp <- colnames(df2)
imp_col <- df1$important_col
matrix <- matrix(vecp, byrow = TRUE,
nrow = length(imp_col),
ncol = length(vecp),
dimnames = list(1:length(imp_col), vecp))
d <- ifelse(matrix == imp_col, 1, 0)
df_fin <- cbind(df1, d)
Where I am trying to make the code more efficient (is where the I have memory issues) at line d <- ifelse(matrix == imp_col, 1, 0)
. 我试图在
d <- ifelse(matrix == imp_col, 1, 0)
)的行提高代码效率的地方(就是我d <- ifelse(matrix == imp_col, 1, 0)
内存问题的地方d <- ifelse(matrix == imp_col, 1, 0)
。
Is there a way I can make the matrix a sparse
matrix before I apply the ifesle
statement. 在应用
ifesle
语句之前,有没有办法使矩阵成为sparse
矩阵。
I build a matrix like the following: 我建立如下矩阵:
col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 11 14 3 11 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 19 15 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 17 10 10 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 13 10 8 17 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 18 5 3 18 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 11 10 9 5 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 5 11 18 16 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 5 8 13 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 10 1 7 16 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 4 17 17 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The end product is like: 最终产品如下:
col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 6 20 18 20 3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 16 10 14 19 9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
3 9 14 13 14 9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
4 18 12 20 16 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 19 3 14 1 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1 18 15 10 3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 11 5 11 16 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 18 1 12 5 10 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
9 12 7 6 7 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 10 20 3 5 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Which I Will then make into a sparse matrix. 然后我将其变成一个稀疏矩阵。
The problem is that d
is the same size as your matrix, so if your matrix is huge then you'll have two of them. 问题在于
d
与矩阵的大小相同,因此,如果矩阵很大,则将有两个。 One posible option (although probably slower) is to iterate through the columns and change them one at a time, this only creates objects the same size as one column of your matrix. 一种可能的选择(尽管可能更慢)是遍历各列并一次更改它们,这只会创建与矩阵的一列大小相同的对象。 You could give this a try:
您可以尝试一下:
for (i in 1:ncol(matrix)) matrix[, i] <- matrix[, i] == imp_col
The expression returns a boolean but if your matrix is made of integers then they will be converted to 0 and 1. 该表达式返回一个布尔值,但是如果您的矩阵由整数组成,则它们将被转换为0和1。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.