f1_score：ValueError所有輸入數組的維數必須相同

Question

我想計算f1_score 。

代碼如下所示：

if __name__ == '__main__':
    y_pred_df = pd.read_csv('file1.csv', skipinitialspace=True, sep='\t', header=None, dtype= str)
    y_pred = y_pred_df.values

    y_true_df = pd.read_csv('file2.csv', header=None, dtype= str)
    y_true = y_true_df.values

    test_score = accuracy_score(y_true[:,0], y_pred[:,0])
    print("\n Accuracy score (Random Forest with 100 estimators) : {}%".format(round(test_score*100,2)))

    print (y_true[:,0])
    print (y_pred[:,0])

    score_test = f1_score(y_true[:,0], y_pred[:,0],pos_label=list(set(y_true[:,0])),average = 'weighted')


    print (score_test)

執行上述代碼時，在計算f1_score時出現以下錯誤：

Accuracy score (Random Forest with 100 estimators) : 61.62%
['4' '4' '4' '4' '4' '12' '12' '12' '12' '12' '12' '12' '12' '4' '4' '4'
'4' '4' '4' '4' '4' '4' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'     '12'
'12' '12' '12' '12' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4'
'4' '4' '4' '4' '4' '12' '12' '4' '4' '4' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '4' '4'
'4' '4' '4' '4']

['4' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '4' '12' '4' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12']
Traceback (most recent call last):
  File "<ipython-input-25-f80f0ca3aea2>", line 1, in <module>
runfile('C:/Anaconda3/envs/python27/Scripts/spade/examples/project/Fmeasure.py', wdir='C:/Anaconda3/envs/python27/Scripts/spade/examples/project')

  File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Anaconda3/envs/python27/Scripts/spade/examples/project/Fmeasure.py", line 47, in <module>
    score_test = f1_score(y_true[:,0], y_pred[:,0],pos_label=list(set(y_true[:,0])),average = 'binary')

  File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 639, in f1_score
    sample_weight=sample_weight)

  File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 756, in fbeta_score
    sample_weight=sample_weight)

  File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 992, in precision_recall_fscore_support
    assume_unique=True)])

  File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 280, in hstack
    return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions

你能告訴我問題來源嗎？

Answer 1

pos_label必須僅包含一個元素，您正在傳遞標簽列表。

pos_label旨在一次計算一個標簽的f1得分，當您傳遞列表時它崩潰。 如果要計算每個標簽的f1，則應進行循環，在其中循環遍歷標簽集，如下所示：

for label in set(yt)
    score_test = f1_score(yt_, yp_, pos_label=[label])
    print( 'f1', label, score_test )

如果您想要的是f1分數的加權平均值，那么您不應該使用pos_label，

score_test = f1_score(yt_, yp_, average = 'weighted')

但是，在sklearn 0.20上，以下工作有效，但它會向您發出警告

from sklearn.metrics import f1_score

if __name__ == '__main__':
    yt_ =  ['4', '4', '4', '4', '4', '12', '12', '12', '12', '12', '12', '12', '12', '4', '4', '4', '4', '4', '4', '4', '4', '4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '12', '12', '4', '4', '4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '4', '4', '4', '4', '4', '4']

    yp_ = ['4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '4', '12', '4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12'] 

    score_test = f1_score(yt_, yp_, pos_label=list(set(yt_)),average = 'weighted')

    print (score_test)

警告：

用戶警告：請注意，當平均值！='binary'（得到'weighted'）時，pos_label（設置為['12'，'4']）將被忽略。 您可以使用labels = [pos_label]來指定單個肯定類。

f1_score：ValueError所有輸入數組的維數必須相同

問題描述

1 個解決方案

解決方案1
0 2018-10-11 12:36:02

f1_score：ValueError所有輸入數組的維數必須相同

問題描述

1 個解決方案

解決方案1 0 2018-10-11 12:36:02

解決方案1
0 2018-10-11 12:36:02