简体   繁体   中英

How compute weighted accuracy for multi-class classification?

I do multi-class classification on unbalanced classes. I'm using SGDClassifier(), GradientBoostingClassifier(), RandomForestClassifier(), and LogisticRegression() with class_weight='balanced' . To compare the results. it is required to compute the accuracy. I tried the following way to compute weighted accuracy:

n_samples = len(y_train)
weights_cof = float(n_samples)/(n_classes*np.bincount(data[target_label].as_matrix().astype(int))[1:])
sample_weights = np.ones((n_samples,n_classes)) * weights_cof
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)

y_train is a binary array. So sample_weights has the same shape as y_train ( n_samples, n_classes ). When I run the script, I received the following error:

Update:

 Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 424, in <module>
    predict_country(featuresDF, score, featuresLabel, country_sample_size, 'gbc')
  File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 313, in predict_country
    print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 183, in accuracy_score
    return _weighted_sum(score, sample_weight, normalize)
  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 108, in _weighted_sum
    return np.average(sample_score, weights=sample_weight)
  File "C:\ProgramData\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1124, in average
    "Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.

The error would seem to suggest that the shape of your sample_weights and your y_test / y_pred arrays differ. Basically the method creates a boolean array with y_test == y_pred and passes that along with sample_weights to np.average . One of the first checks in that method is to ensure that the entered array and the weights are the same shape, which apparently in this case they are not.

Update

Your comment "sample_weights, y_test, and y_pred have the same shape (n_samples, n_classes)" exposes the issue. According to the documentation for accuracy_score , y_pred and y_true (in your case y_test and y_pred ) should be 1 dimensional. Are you perhaps using one hot encoded labels? If so you should convert them to single value labels and then try the accuracy score again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM