[英]Having problems with dimensions in machine learning ( Python Scikit )
我对应用机器学习有点陌生,所以我试图教自己如何使用mldata.org和Python scikit包中的任何数据进行线性回归。 我测试了线性回归示例代码( http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html ),并且该代码与糖尿病数据集配合得很好。 但是,我尝试将代码与其他数据集一起使用,例如关于mldata上的地震的代码( http://mldata.org/repository/data/viewslug/global-earthquakes/ )。 但是,由于那里的尺寸问题,我无法这样做。
Warning (from warnings module):
File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 55
warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.
Warning (from warnings module):
File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 65
ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
Traceback (most recent call last):
File "/home/anthony/Documents/Programming/Python/Machine Learning/Scikit/earthquake_linear_regression.py", line 38, in <module>
regr.fit(earthquake_X_train, earthquake_y_train)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 371, in fit
linalg.lstsq(X, y)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 518, in lstsq
raise ValueError('incompatible dimensions')
ValueError: incompatible dimensions
如何设置数据的维数?
数据大小:
抗震形状(59209,1,4)抗震形状(59189,1)抗震形状(3,59209)抗震形状(3,59209)
编码:
# Code source: Jaques Grobler
# License: BSD 3 clause
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
#Experimenting with earthquake data
from sklearn.datasets.mldata import fetch_mldata
import tempfile
test_data_home = tempfile.mkdtemp()
# Load the diabetes dataset
earthquake = fetch_mldata('Global Earthquakes', data_home = test_data_home)
# Use only one feature
earthquake_X = earthquake.data[:, np.newaxis]
earthquake_X_temp = earthquake_X[:, :, 2]
# Split the data into training/testing sets
earthquake_X_train = earthquake_X_temp[:-20]
earthquake_X_test = earthquake_X_temp[-20:]
# Split the targets into training/testing sets
earthquake_y_train = earthquake.target[:-20]
earthquake_y_test = earthquake.target[-20:]
print "Splitting of data for preformance check completed"
# Create linear regression object
regr = linear_model.LinearRegression()
print "Created linear regression object"
# Train the model using the training sets
regr.fit(earthquake_X_train, earthquake_y_train)
print "Dataset trained"
# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean square error
print("Residual sum of squares: %.2f"
% np.mean((regr.predict(earthquake_X_test) - earthquake_y_test) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(earthquake_X_test, earthquake_y_test))
# Plot outputs
plt.scatter(earthquake_X_test, earthquake_y_test, color='black')
plt.plot(earthquake_X_test, regr.predict(earthquake_X_test), color='blue',
linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
您的目标数组( earthquake_y_train
)形状错误。 而且实际上它是空的。
当你做
earthquake_y_train = earthquake.target[:-20]
您选择了第一个轴中除最后20个以外的所有行。 并且,根据您发布的数据, earthquake.target
(3, 59209)
形状为(3, 59209)
,因此没有可供选择的行!
但是,即使有任何错误,仍然会出错。 为什么? 因为X
和y
第一个尺寸必须相同。 根据sklearn的文档, LinearRegression的拟合期望X
的形状为[n_samples,n_features]和y
— [n_samples,n_targets]。
为了修复它,将ys的定义更改为以下内容:
earthquake_y_train = earthquake.target[:, :-20].T
earthquake_y_test = earthquake.target[:, -20:].T
PS即使解决了所有这些问题,脚本中仍然存在一个问题: plt.scatter
无法与“多维” ys一起使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.