[英]My sklearn_crfsuite model does not learn anything
我正在尝试按照此处的教程创建注释预测模型,但我的模型没有学到任何东西。 这是我的训练数据和标签的示例:
[{'bias': 1.0, 'word.lower()': '\\nreference\\nissue\\ndate\\ndgt86620\\n4\\n \\n19-dec-05\\nfalcon\\n7x\\ntype\\ncertification\\n27_4-100\\nthis\\ ndocument\\nis\\nthe\\nintlectual\\nprop...nairbrakes\\nhandle\\nposition\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n0\\ntable\\ n1\\n:\\nairbrake\\ncas\\nmessages\\n', 'word[-3:]': 'es\\n', 'word[-2:]': 's\\n', 'word.isupper() ':假,'word.istitle()':假,'word.isdigit()':假,'postag':'POS','postag[:2]':'PO','w_emb_0':0.03418987928976114, 'w_emb_1':0.617338281 1066742, 'w_emb_2':0.004420982990809508, 'w_emb_3':0.08293022662242588 'w_emb_4':0.22162269482070363 'w_emb_5':0.4334545347397811 'w_emb_6':0.7844891779932379 'w_emb_7':0.028043262790094503, 'w_emb_8':0.5233847386564157“ w_emb_9' :0.9685677133128328 'w_em b_10':0.19379126558708126 'w_emb_11':0.2809608896964926 'w_emb_12':0.384759230815804, 'w_emb_13':0.15385904662767336 'w_emb_14':0.5206500040610533 'w_emb_15':0.009148526006733215, 'w_emb_16':0.5894118695171416“w_emb_17 ': 0.7356989708459056, 'w_emb_18': 0. 5576774100159024, 'w_emb_19': 0.2185294430010376, 'BOS': True, '+1:.', '引用'1:','+lower. '+1:word.isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}, {'bias': 1.0, 'word. lower()': 'reference', 'word[-3:]': 'NCE', 'word[-2:]': 'CE', 'word.isupper()': True, 'word.istitle( )': False, 'word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': -0.390038, 'w_emb_1': 0.30677223, 'w_emb_2 “:-1.010975, 'w_emb_3':0.3656154, 'w_emb_4':0.5319459, 'w_emb_5':0.45572615, 'w_emb_6':-0.4 6090943, 'w_emb_7':0.87250936, 'w_emb_8':0.036648277, 'w_emb_9':-0.3057043, 'w_emb_10':0.33427167, 'w_emb_11':-0.19664396, 'w_emb_12':-0.64899784, 'w_emb_13':-0.1785065, 'w_emb_14':-0.117423356, 'w_emb_15':0.16247013, 'w_emb_16':0.11694676, 'w_emb_17': -0.30 693895, 'w_emb_18': -1.0026807, 'w_emb_19': 0.9946743, '-1:word.lower()': '\\nreference...n \\n \\n \\n \\n \\n \\n \\n \\ n0\\ntable\\n1\\n:\\nairbrake\\ncas\\nmessages\\n', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'POS', '-1:postag[:2]': 'PO ', '+1:word.lower()': 'issue', '+1:word.istitle()': False, '+1:word. isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}, {'bias': 1.0, 'word.lower()': '问题', 'word[-3:]': 'SUE', 'word[-2:]': 'UE', 'word.isupper()': True, 'word.istitle()': False, ' word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': -1.220 4882, 'w_emb_1': 0.8920707, 'w_emb_2': -3,8380668 'w_emb_3':1.5641377, 'w_emb_4':2.1918254, 'w_emb_5':1.8509868, 'w_emb_6':-2.0664182, 'w_emb_7':3.1591077, 'w_emb_8':-0.33126026, 'w_emb_9':-1.4278139, 'w_emb_10':0.9291533 , 'w_emb_11':-0.6761407, 'w_emb_12':-2.9582167, 'w_emb_13':-0.5395561, 'w_emb_14':-0.8363763, 'w_emb_15':0.25568742, 'w_emb_16':0.4932978, 'w_emb_17':-1.6198335,“w_emb_18 ': -4.183924, 'w_emb_19': 4.281094, '-1:word.lower()': 'reference', '-1:word.istitle()': False, '-1:word.isupper()':真的,'-1:p ostag': 'POS', '-1:postag[:2]': 'PO', '+1:word.lower()': 'date', '+1:word. istitle()': False, '+1:word.isupper()': 真, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}...]
y_train = ['O', 'O', 'O'...'I-data-ca-s_message-type'....'B-data-ca-s_message-type']
这是模型定义和训练:
`
crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
c1=0.1,
c2=0.1,
max_iterations=100,
all_possible_transitions=True
)
crf.fit(X_train, y_train)
y_pred = crf.predict(X_test)
sorted_labels = sorted(labels, key=lambda name: (name[1:], name[0]))
msg = metrics.flat_classification_report(y_test, y_pred, labels=labels, digits=4)
print(msg)
`
不幸的是,我的模型没有学到任何东西:
precision recall f1-score support
B-data-c-a-s_message-type 0.0000 0.0000 0.0000 23
I-data-c-a-s_message-type 0.0000 0.0000 0.0000 90
micro avg 0.0000 0.0000 0.0000 113
macro avg 0.0000 0.0000 0.0000 113
weighted avg 0.0000 0.0000 0.0000 113
问题已经解决了。 如上所示,支持(评估样本数)总共为 113。然而,训练集中的样本数仅为 14 左右!! 这太小了! 而我只是没有注意到这种差异。 我已经反转了训练和测试数据集,现在,性能是这样的:
precision recall f1-score support
B-data-c-a-s_message-type 0.0000 0.0000 0.0000 0
I-data-c-a-s_message-type 0.6364 1.0000 0.7778 14
micro avg 0.6364 1.0000 0.7778 14
macro avg 0.3182 0.5000 0.3889 14
weighted avg 0.6364 1.0000 0.7778 14
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.