繁体   English   中英

sci-kit学习了对某些数据量的崩溃

[英]sci-kit learn crashing on certain amounts of data

我正在尝试处理一个numpy数组,其中包含71,000行200列浮点数,当我超过5853行时,我正在尝试的两个sci-kit学习模型都会出现不同的错误。 我尝试删除有问题的行,但它仍然失败。 sci-kit可以学习不处理这么多数据,还是其他什么? X是列表列表的numpy数组。

KNN:

nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)

错误:

File "knn.py", line 48, in <module>
  nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.py", line 642, in fit
  return self._fit(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.py", line 180, in _fit
  raise ValueError("data type not understood")

ValueError:数据类型未被理解

K-方式:

kmeans_model = KMeans(n_clusters=2, random_state=1).fit(X)

错误:

Traceback (most recent call last):
File "knn.py", line 48, in <module>
kmeans_model = KMeans(n_clusters=2, random_state=1).fit(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 702, in fit
X = self._check_fit_data(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 668, in _check_fit_data
X = atleast2d_or_csr(X, dtype=np.float64)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 134, in atleast2d_or_csr
"tocsr", force_all_finite)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 111, in _atleast2d_or_sparse
force_all_finite=force_all_finite)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 91, in array2d
X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

请检查dtype的矩阵X ,例如,通过键入X.dtype 如果是objectdtype('O') ,则将X行的长度写入数组:

lengths = [len(line) for line in X]

然后通过调用来查看是否所有行都具有相同的长度

np.unique(lengths)

如果输出中有多个数字,那么您的线路长度是不同的,例如从线路5853开始,但可能不是所有时间。

Numpy数据数组仅在所有行具有相同长度时才有用(如果没有,它们将继续工作,但不会按预期执行)。 您应该检查是什么导致了这一点,纠正它,然后返回knn

以下是行长不相同时会发生什么的示例:

import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(100, 20)
# now remove one element from the 56th line
X = list(X)
X[55] = X[55][:-1]
# turn it back into an ndarray
X = np.array(X)
# check the dtype
print X.dtype  # returns dtype('O')

from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors()
nbrs.fit(X)  # raises your first error

from sklearn.cluster import KMeans
kmeans = KMeans()
kmeans.fit(X)  # raises your second error

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM