我的 sklearn_crfsuite 模型没有学到任何东西

Question

我正在尝试按照此处的教程创建注释预测模型，但我的模型没有学到任何东西。 这是我的训练数据和标签的示例：

[{'bias': 1.0, 'word.lower()': '\\nreference\\nissue\\ndate\\ndgt86620\\n4\\n \\n19-dec-05\\nfalcon\\n7x\\ntype\\ncertification\\n27_4-100\\nthis\\ ndocument\\nis\\nthe\\nintlectual\\nprop...nairbrakes\\nhandle\\nposition\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n0\\ntable\\ n1\\n:\\nairbrake\\ncas\\nmessages\\n', 'word[-3:]': 'es\\n', 'word[-2:]': 's\\n', 'word.isupper() '：假，'word.istitle()'：假，'word.isdigit()'：假，'postag'：'POS'，'postag[:2]'：'PO'，'w_emb_0'：0.03418987928976114， 'w_emb_1'：0.617338281 1066742， 'w_emb_2'：0.004420982990809508， 'w_emb_3'：0.08293022662242588 'w_emb_4'：0.22162269482070363 'w_emb_5'：0.4334545347397811 'w_emb_6'：0.7844891779932379 'w_emb_7'：0.028043262790094503， 'w_emb_8'：0.5233847386564157“ w_emb_9' ：0.9685677133128328 'w_em b_10'：0.19379126558708126 'w_emb_11'：0.2809608896964926 'w_emb_12'：0.384759230815804， 'w_emb_13'：0.15385904662767336 'w_emb_14'：0.5206500040610533 'w_emb_15'：0.009148526006733215， 'w_emb_16'：0.5894118695171416“w_emb_17 '： 0.7356989708459056, 'w_emb_18': 0. 5576774100159024, 'w_emb_19': 0.2185294430010376, 'BOS': True, '+1:.', '引用'1:','+lower. '+1:word.isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}, {'bias': 1.0, 'word. lower()': 'reference', 'word[-3:]': 'NCE', 'word[-2:]': 'CE', 'word.isupper()': True, 'word.istitle( )': False, 'word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': -0.390038, 'w_emb_1': 0.30677223, 'w_emb_2 “：-1.010975， 'w_emb_3'：0.3656154， 'w_emb_4'：0.5319459， 'w_emb_5'：0.45572615， 'w_emb_6'：-0.4 6090943， 'w_emb_7'：0.87250936， 'w_emb_8'：0.036648277， 'w_emb_9'：-0.3057043， 'w_emb_10'：0.33427167， 'w_emb_11'：-0.19664396， 'w_emb_12'：-0.64899784， 'w_emb_13'：-0.1785065， 'w_emb_14'：-0.117423356， 'w_emb_15'：0.16247013， 'w_emb_16'：0.11694676， 'w_emb_17'： -0.30 693895, 'w_emb_18': -1.0026807, 'w_emb_19': 0.9946743, '-1:word.lower()': '\\nreference...n \\n \\n \\n \\n \\n \\n \\n \\ n0\\ntable\\n1\\n:\\nairbrake\\ncas\\nmessages\\n', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'POS', '-1:postag[:2]': 'PO ', '+1:word.lower()': 'issue', '+1:word.istitle()': False, '+1:word. isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}, {'bias': 1.0, 'word.lower()': '问题', 'word[-3:]': 'SUE', 'word[-2:]': 'UE', 'word.isupper()': True, 'word.istitle()': False, ' word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': -1.220 4882, 'w_emb_1': 0.8920707, 'w_emb_2': -3,8380668 'w_emb_3'：1.5641377， 'w_emb_4'：2.1918254， 'w_emb_5'：1.8509868， 'w_emb_6'：-2.0664182， 'w_emb_7'：3.1591077， 'w_emb_8'：-0.33126026， 'w_emb_9'：-1.4278139， 'w_emb_10'：0.9291533 ， 'w_emb_11'：-0.6761407， 'w_emb_12'：-2.9582167， 'w_emb_13'：-0.5395561， 'w_emb_14'：-0.8363763， 'w_emb_15'：0.25568742， 'w_emb_16'：0.4932978， 'w_emb_17'：-1.6198335，“w_emb_18 ': -4.183924, 'w_emb_19': 4.281094, '-1:word.lower()': 'reference', '-1:word.istitle()': False, '-1:word.isupper()':真的，'-1:p ostag': 'POS', '-1:postag[:2]': 'PO', '+1:word.lower()': 'date', '+1:word. istitle()': False, '+1:word.isupper()': 真, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}...]
y_train = ['O', 'O', 'O'...'I-data-ca-s_message-type'....'B-data-ca-s_message-type']

这是模型定义和训练：

`

crf = sklearn_crfsuite.CRF(
            algorithm='lbfgs',
            c1=0.1,
            c2=0.1,
            max_iterations=100,
            all_possible_transitions=True
        )
crf.fit(X_train, y_train)

y_pred = crf.predict(X_test)
sorted_labels = sorted(labels, key=lambda name: (name[1:], name[0]))

msg = metrics.flat_classification_report(y_test, y_pred, labels=labels, digits=4)
print(msg)

`

不幸的是，我的模型没有学到任何东西：

                           precision    recall  f1-score   support   
B-data-c-a-s_message-type     0.0000    0.0000    0.0000        23  
I-data-c-a-s_message-type     0.0000    0.0000    0.0000        90
                micro avg     0.0000    0.0000    0.0000       113
                macro avg     0.0000    0.0000    0.0000       113
             weighted avg     0.0000    0.0000    0.0000       113

Answer 1

问题已经解决了。 如上所示，支持（评估样本数）总共为 113。然而，训练集中的样本数仅为 14 左右！！ 这太小了！ 而我只是没有注意到这种差异。 我已经反转了训练和测试数据集，现在，性能是这样的：

                            precision    recall  f1-score   support
B-data-c-a-s_message-type     0.0000    0.0000    0.0000     0     
I-data-c-a-s_message-type     0.6364    1.0000    0.7778     14
                micro avg     0.6364    1.0000    0.7778     14                    
                macro avg     0.3182    0.5000    0.3889     14             
             weighted avg     0.6364    1.0000    0.7778      14

我的 sklearn_crfsuite 模型没有学到任何东西

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-04-01 09:15:52

我的 sklearn_crfsuite 模型没有学到任何东西

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-04-01 09:15:52

解决方案1
0 已采纳 2020-04-01 09:15:52