使用 PCA 时出现数学域错误

Question

I am using python's scikit-learn package to implement PCA .I am getting math我正在使用 python 的 scikit-learn 包来实现 PCA。我正在学习数学

domain error :
C:\Users\Akshenndra\Anaconda2\lib\site-packages\sklearn\decomposition\pca.pyc in _assess_dimension_(spectrum, rank, n_samples, n_features)
     78         for j in range(i + 1, len(spectrum)):
     79             pa += log((spectrum[i] - spectrum[j]) *
---> 80                       (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)
     81 
     82     ll = pu + pl + pv + pp - pa / 2. - rank * log(n_samples) / 2.

ValueError: math domain error

I already know that math domain error is caused when we take logarithm of a negative number ,but I don't understand here how can there be a negative number inside the logarithm ?我已经知道当我们取负数的对数时会导致数学域错误，但我在这里不明白对数中怎么会有负数？ because this code works fine for other datasets.因为此代码适用于其他数据集。 maybe is this related to what is written in the sci-kitlearn's website -"This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data."(there are large number of 0 values)也许这与 sci-kitlearn 网站上写的内容有关-“此实现使用奇异值分解的 scipy.linalg 实现。它仅适用于密集数组，不能扩展到大维数据。”（有大0 值的数量）

Answer 1

I think you should add 1 instead, as the numpy log1p description page .我认为您应该添加 1 作为numpy log1p description page 。 Since log(p+1) = 0 when p = 0 (while log(e-99) = -99), and as the quote in the link由于 log(p+1) = 0 当 p = 0 时（而 log(e-99) = -99），并且作为链接中的引用

For real-valued input, log1p is accurate also for x so small that 1 + x == 1 in floating-point accuracy对于实值输入，log1p 对于 x 小到 1 + x == 1 的浮点精度也是准确的

The code can be modified as follows to make what you trying to resolve more reasonable:可以对代码进行如下修改，以使您尝试解决的问题更合理：

for i in range(rank):
    for j in range(i + 1, len(spectrum)):
        pa += log((spectrum[i] - spectrum[j]) *
        (1. / spectrum_[j] - 1. / spectrum_[i]) + 1) + log(n_samples + 1)
    ll = pu + pl + pv + pp - pa / 2. - rank * log(n_samples + 1) / 2

Answer 2

I don't know whether i am right or not, but I truly find a way to solve it.我不知道我是否正确，但我确实找到了解决它的方法。

I just print some error information(The value of spectrum_[i] and spectrum_[j]), and I find :我只是打印了一些错误信息（频谱_[i] 和频谱_[j] 的值），我发现：

sometimes, they are same!!!有时，他们是一样的！！！

(Maybe they are not same but they are too close, I guess) （也许他们不一样，但他们太接近了，我猜）

so , here所以在这里

pa += log((spectrum[i] - spectrum[j]) *
                  (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)

it will report error when calculate log(0).计算log(0)时会报错。

My way to solve it is to add a very small number 1e-99 to 0, so it become log(0 + 1e-99)我的解决方法是在0上加一个很小的数1e-99，这样就变成了log(0 + 1e-99)

so you can just change it to:所以你可以把它改成：

            pa += log((spectrum[i] - spectrum[j]) *
                  (1. / spectrum_[j] - 1. / spectrum_[i]) + 1e-99) + log(n_samples)

使用 PCA 时出现数学域错误

问题描述

2 个解决方案

解决方案1
1 2017-10-21 16:48:52

解决方案2
0 2017-07-25 15:42:07

使用 PCA 时出现数学域错误

问题描述

2 个解决方案

解决方案1 1 2017-10-21 16:48:52

解决方案2 0 2017-07-25 15:42:07

解决方案1
1 2017-10-21 16:48:52

解决方案2
0 2017-07-25 15:42:07