[英]MemoryError while converting sparse matrix to dense matrix? (numpy, scikit)
lr = lm.LogisticRegression(penalty='l2', dual=True, tol=0.0001,
C=1, fit_intercept=True, intercept_scaling=1.0,
class_weight=None, random_state=None)
rd = AdaBoostClassifier( base_estimator=lr,
learning_rate=1,
n_estimators=20,
algorithm="SAMME")
##here, i am deleting unnecesseary objects
##print X.shape
##(7395, 412605)
print "20 Fold CV Score: ", np.mean(cross_validation.cross_val_score(rd, X, y, cv=20, scoring='roc_auc'))
When i run this i get this error: 当我运行这个我得到这个错误:
TypeError: A sparse matrix was passed, but dense data is required.
TypeError:传递了稀疏矩阵,但需要密集数据。 Use X.toarray() to convert to a dense numpy array.
使用X.toarray()转换为密集的numpy数组。
And then, i changed my code like this: 然后,我改变了我的代码:
print "20 Fold CV Score: ", np.mean(cross_validation.cross_val_score(rd, X.toarray(), y, cv=20, scoring='roc_auc'))
Now, i have the following exception: 现在,我有以下例外:
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 559, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/coo.py", line 235, in toarray
B = self._process_toarray_args(order, out)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 628, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
Any suggestions to solve the issue? 有什么建议可以解决这个问题?
MemoryError
means that there isn't enough RAM available on your system to allocate the matrix. MemoryError
意味着系统上没有足够的RAM来分配矩阵。 Why? 为什么? Well, a
7395 x 412605
matrix has 3,051,213,975 elements. 好吧,
7395 x 412605
矩阵有3,051,213,975个元素。 If they're in the default float64
(usually double
in C) datatype, that's 22.7GB. 如果它们在默认的
float64
(通常是C中的double
)数据类型中,则为22.7GB。 If you convert to lower-precision float32
s (usually float
in C), it'd be 11.4GB; 如果转换为精度较低的
float32
(通常是C中的float
),则为11.4GB; maybe that's handle-able on your machine. 也许这可以在你的机器上操作。 It'll still be real slow, though.
不过,它仍然会很慢。
It seems that AdaBoostClassifier
doesn't support sparse inputs (as you can see in the code here ). 似乎
AdaBoostClassifier
不支持稀疏输入(正如您在此处的代码中所见)。 I don't know offhand if dense representations are necessary for the algorithm or if it's just that the implementation assumed that. 我不知道算法是否需要密集表示,或者只是实现假设。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.