简体   繁体   English

Python分类器Sklearn

[英]Python Classifier Sklearn

I am quite new to Python and SKLearn. 我对Python和SKLearn很陌生。 I am trying to make a simple classifier but I am running into a problem. 我试图做一个简单的分类器,但是遇到了问题。 I have been following a few different tutorials but getting an error when I try to use the .fit method. 我一直在关注一些不同的教程,但是在尝试使用.fit方法时遇到错误。 I am new to the concept and have tried the documentation but have found that hard to understand, can any one help with me error or point me in the right direction. 我是这个概念的新手,尝试过文档,但发现很难理解,任何人都可以帮助我解决错误或为我指明正确的方向。

My thinking behind the error is that the values are out of range for the dtype, as I have transformed all the missing values or nan values but the error is still arising 我在错误背后的想法是,值超出了dtype的范围,因为我已经转换了所有缺少的值或nan值,但错误仍然存​​在

Code

def main():
setup_files()

imputer = Imputer()

#the training data minus id and type:
t_num_data = load_csv(training_set_file_path, range(1, 17))
t_num_data_imputed = imputer.fit_transform(t_num_data)
print(t_num_data_imputed)

#the training type column
t_type_col = load_csv(training_set_file_path, 17, dtype=np.dtype((str, 5)))
#the query data minus id and type:
q_data = load_csv(queries_file_path, range(1, 17))
#the query id column
q_id = load_csv(queries_file_path, 0, dtype=np.dtype((str, 10)))


#fit data above to DTC and predict import
model = tree.DecisionTreeClassifier(criterion='entropy')
model.fit_transform(t_num_data, t_type_col)
predictions = model.predict(q_data)


#output the predictions:
with open(solutions_file_path, 'w') as f:
    for i in range(len(predictions)):
        f.write("{},{}\n".format(q_id[i], predictions[i]))


#fit data above to DTC and predict import
model = tree.DecisionTreeClassifier(criterion='entropy')
model.fit(t_num_data, t_type_col)
predictions = model.predict(q_data)


#output the predictions:
with open(solutions_file_path, 'w') as f:
    for i in range(len(predictions)):
        f.write("{},{}\n".format(q_id[i], predictions[i]))

Error 错误

Traceback (most recent call last):
  File "/Users/Rory/Desktop/classifier.py", line 71, in <module>
main()
  File "/Users/Rory/Desktop/classifier.py", line 60, in main
model.fit_transform(t_num_data, t_type_col)
  File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/base.py", line 458, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
  File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/tree/tree.py", line 154, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 398, in check_array
_assert_all_finite(array)
  File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

The problem is your NaN values. 问题是您的NaN值。 There is a long list of ways to estimate NaNs. 有很多种估算NaN的方法。 You could try: 您可以尝试:

t_num_data.fillna(0)

Which will fill all the missing values with 0, and then your classifier will work, but may not be very accurate. 它将用0填充所有缺失值,然后您的分类器将起作用,但可能不是很准确。 There additional methods that take the mean, estimation based on nearest neighbors, etc. But that should get your code working for now. 还有一些采用均值的方法,基于最近邻居的估计法等。但是,这应该可以使您的代码现在正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM