简体   繁体   English

sklearn_crfsuite的参数需要是字节吗?

[英]Do the parameters of sklearn_crfsuite need to be bytes?

I'm trying to build A conditional random field model, following this tutorial https://www.kaggle.com/shoumikgoswami/ner-using-random-forest-and-crf I have followed all the steps but for some reason when I run the line我正在尝试构建一个条件随机场 model,按照本教程https://www.kaggle.com/shoumikgoswami/ner-using-random-forest-and-crf我已经遵循了所有步骤,但由于某种原因,当我跑线

 pred = cross_val_predict(estimator=crf, X=X, y=y, cv=5)

I get the following error我收到以下错误

TypeError: expected bytes, int found

This is the whole code of building the CRF这是构建 CRF 的全部代码

X = [sent2features(s) for s in sentences]
y = [sent2labels(s) for s in sentences]

crf = sklearn_crfsuite.CRF(algorithm='lbfgs',
                           c1=0.1,
                           c2=0.1,
                           max_iterations=100,
                           all_possible_transitions=False)
pred = cross_val_predict(estimator=crf, X=X, y=y, cv=5)

Environment Conda Python 3.7环境康达 Python 3.7

Data数据

y(labels) : [[0, 0, 1], [0, 1, 0], [0, 0, 1], [0, 1, 0, 1, 0, 1], [0, 0, 1, 0]]
X (words:sequence): [[{'bias': 1.0, 'word.lower()': 'we', 'word[-3:]': 'we', 'word[-2:]': 'we', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'PRP', 'postag[:2]': 'PR', 'BOS': True, '+1:word.lower()': 'have', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'VBP', '+1:postag[:2]': 'VB'}, {'bias': 1.0, 'word.lower()': 'have', 'word[-3:]': 'ave', 'word[-2:]': 've', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'VBP', 'postag[:2]': 'VB', '-1:word.lower()': 'we', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'PRP', '-1:postag[:2]': 'PR', '+1:word.lower()': 'potatos', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'VBN', '+1:postag[:2]': 'VB'}, {'bias': 1.0, 'word.lower()': 'potatos', 'word[-3:]': 'tos', 'word[-2:]': 'os', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'VBN', 'postag[:2]': 'VB', '-1:word.lower()': 'have', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'VBP', '-1:postag[:2]': 'VB', 'EOS': True}], [{'bias': 1.0, 'word.lower()': 'an', 'word[-3:]': 'an', 'word[-2:]': 'an', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'DT', 'postag[:2]': 'DT', 'BOS': True, '+1:word.lower()': 'island', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'NN', '+1:postag[:2]': 'NN'}, {'bias': 1.0, 'word.lower()': 'island', 'word[-3:]': 'and', 'word[-2:]': 'nd', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'NN', 'postag[:2]': 'NN', '-1:word.lower()': 'an', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'DT', '-1:postag[:2]': 'DT', '+1:word.lower()': 'forest', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'JJS', '+1:postag[:2]': 'JJ'}, {'bias': 1.0, 'word.lower()': 'forest', 'word[-3:]': 'est', 'word[-2:]': 'st', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'JJS', 'postag[:2]': 'JJ', '-1:word.lower()': 'island', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'NN', '-1:postag[:2]': 'NN', 'EOS': True}], [{'bias': 1.0, 'word.lower()': 'up', 'word[-3:]': 'up', 'word[-2:]': 'up', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'IN', 'postag[:2]': 'IN', 'BOS': True, '+1:word.lower()': 'the', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'DT', '+1:postag[:2]': 'DT'}, {'bias': 1.0, 'word.lower()': 'the', 'word[-3:]': 'the', 'word[-2:]': 'he', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'DT', 'postag[:2]': 'DT', '-1:word.lower()': 'up', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'IN', '-1:postag[:2]': 'IN', '+1:word.lower()': 'mile', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'NN', '+1:postag[:2]': 'NN'}, {'bias': 1.0, 'word.lower()': 'mile', 'word[-3:]': 'ile', 'word[-2:]': 'le', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'NN', 'postag[:2]': 'NN', '-1:word.lower()': 'the', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'DT', '-1:postag[:2]': 'DT', 'EOS': True}], [{'bias': 1.0, 'word.lower()': 'the', 'word[-3:]': 'the', 'word[-2:]': 'he', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'DT', 'postag[:2]': 'DT', 'BOS': True, '+1:word.lower()': 'smell', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'NN', '+1:postag[:2]': 'NN'}, {'bias': 1.0, 'word.lower()': 'smell', 'word[-3:]': 'ell', 'word[-2:]': 'll', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'NN', 'postag[:2]': 'NN', '-1:word.lower()': 'the', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'DT', '-1:postag[:2]': 'DT', '+1:word.lower()': 'of', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'IN', '+1:postag[:2]': 'IN'}, {'bias': 1.0, 'word.lower()': 'of', 'word[-3:]': 'of', 'word[-2:]': 'of', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'IN', 'postag[:2]': 'IN', '-1:word.lower()': 'smell', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'NN', '-1:postag[:2]': 'NN', '+1:word.lower()': 'tulips', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'NNS', '+1:postag[:2]': 'NN'}, {'bias': 1.0, 'word.lower()': 'tulips', 'word[-3:]': 'ips', 'word[-2:]': 'ps', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'NNS', 'postag[:2]': 'NN', '-1:word.lower()': 'of', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'IN', '-1:postag[:2]': 'IN', '+1:word.lower()': 'and', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'CC', '+1:postag[:2]': 'CC'}, {'bias': 1.0, 'word.lower()': 'and', 'word[-3:]': 'and', 'word[-2:]': 'nd', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'CC', 'postag[:2]': 'CC', '-1:word.lower()': 'tulips', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'NNS', '-1:postag[:2]': 'NN', '+1:word.lower()': 'roses', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'NNS', '+1:postag[:2]': 'NN'}, {'bias': 1.0, 'word.lower()': 'roses', 'word[-3:]': 'ses', 'word[-2:]': 'es', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'NNS', 'postag[:2]': 'NN', '-1:word.lower()': 'and', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'CC', '-1:postag[:2]': 'CC', 'EOS': True}], [{'bias': 1.0, 'word.lower()': 'i', 'word[-3:]': 'i', 'word[-2:]': 'i', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'VBP', 'postag[:2]': 'VB', 'BOS': True, '+1:word.lower()': 'would', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'MD', '+1:postag[:2]': 'MD'}, {'bias': 1.0, 'word.lower()': 'would', 'word[-3:]': 'uld', 'word[-2:]': 'ld', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'MD', 'postag[:2]': 'MD', '-1:word.lower()': 'i', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'VBP', '-1:postag[:2]': 'VB', '+1:word.lower()': 'love', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'VB', '+1:postag[:2]': 'VB'}, {'bias': 1.0, 'word.lower()': 'love', 'word[-3:]': 'ove', 'word[-2:]': 've', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'VB', 'postag[:2]': 'VB', '-1:word.lower()': 'would', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'MD', '-1:postag[:2]': 'MD', '+1:word.lower()': 'it', '+1:word.istitle()': False, '+1:word.isupper()': False, '+1:postag': 'PRP', '+1:postag[:2]': 'PR'}, {'bias': 1.0, 'word.lower()': 'it', 'word[-3:]': 'it', 'word[-2:]': 'it', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'PRP', 'postag[:2]': 'PR', '-1:word.lower()': 'love', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'VB', '-1:postag[:2]': 'VB', 'EOS': True}]]

I solved the issue.我解决了这个问题。 As we can see on the data examples the y labels are a list of arrays containing integers 0s and 1s,.正如我们在数据示例中看到的,y 标签是包含整数 0 和 1 的 arrays 列表。 So after I changed the labels from the y variable from ints to strings of 0s and 1s, It did work.因此,在我将 y 变量的标签从整数更改为 0 和 1 的字符串之后,它确实起作用了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM