简体   繁体   中英

sklearn PCA with n_components = 'mle' and svd_solver = 'full' results in math domain error

My Question is highly related to math domain error while using PCA

I get the following error:

  File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 88, in _assess_dimension_(1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)
ValueError: math domain error

which refers to this line of code :

pa += log((spectrum[i] - spectrum[j]) * (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)

After looking closer i found out that the problem is caused by this part of the equation:

(spectrum[i] - spectrum[j])

which results in 0 if these values are equal. This leads to a multiplication by 0 which results in a log(0) what causes this exception.

Now my question. Is the fact this error can occur a sign that my data is bad or should the implementation handle this case? If the implementation should handle this, what way would you recommend to handle this properly? In the linked question there is already an answer to this but it doesn't look very confident to be right and hasn't any feedback.

Created an issue on the github repo of scikit-learn containing steps to reproduce the error.

This is due to an open issue inside sklearn. This is confirmed here

A fix to this issue was introduced in scikit-learn 0.23.0, so simply update to this version.

Release Notes for scikit-learn 0.23

[MRG+1] Adress decomposition.PCA mle option problem #16224

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM