I do multi-class classification on unbalanced classes. I'm using SGDClassifier(), GradientBoostingClassifier(), RandomForestClassifier(), and LogisticRegression()
with class_weight='balanced'
. To compare the results. it is required to compute the accuracy. I tried the following way to compute weighted accuracy:
n_samples = len(y_train)
weights_cof = float(n_samples)/(n_classes*np.bincount(data[target_label].as_matrix().astype(int))[1:])
sample_weights = np.ones((n_samples,n_classes)) * weights_cof
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
y_train
is a binary array. So sample_weights
has the same shape as y_train
( n_samples, n_classes
). When I run the script, I received the following error:
Update:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 424, in <module>
predict_country(featuresDF, score, featuresLabel, country_sample_size, 'gbc')
File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 313, in predict_country
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 183, in accuracy_score
return _weighted_sum(score, sample_weight, normalize)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 108, in _weighted_sum
return np.average(sample_score, weights=sample_weight)
File "C:\ProgramData\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1124, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.
The error would seem to suggest that the shape of your sample_weights and your y_test
/ y_pred
arrays differ. Basically the method creates a boolean array with y_test == y_pred
and passes that along with sample_weights
to np.average
. One of the first checks in that method is to ensure that the entered array and the weights are the same shape, which apparently in this case they are not.
Your comment "sample_weights, y_test, and y_pred have the same shape (n_samples, n_classes)" exposes the issue. According to the documentation for accuracy_score
, y_pred
and y_true
(in your case y_test
and y_pred
) should be 1 dimensional. Are you perhaps using one hot encoded labels? If so you should convert them to single value labels and then try the accuracy score again.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.