[英]How to convert from CoNLL format to spacy format
我目前正在研究 NER 模型。 我有一堆以 CoNLL 格式存儲的數據,需要轉換為 Spacy 格式。 在 CoNLL 中,句子的每個單詞旁邊都有一個標簽。 在 Spacy 中,標簽僅顯示給具有實際標簽的單詞。 我如何從下面的這種格式轉換(CoNLL)
From O
2001 B-DateTime
to I-DateTime
2004 I-DateTime
, O
I O
was O
a O
stagehand O
for O
Hartford B-Company
Stage I-Company
Company O
. O
到下面的這種格式(Spacy)
TRAIN_DATA = [('what is the price of polo?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of jeans?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of bat?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of shirt?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of bag?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of cup?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of jug?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of plate?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of glass?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of watch?', {'entities': [(21, 26, 'PrdName')]})]
只需使用spacy convert 。
spacy convert input.conll -c conll -o ./output/
請注意,默認情況下這會生成一個二進制.spacy
文件。 JSON 格式在 v3 中已棄用,並沒有太大幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.