簡體   English   中英

如何從 CoNLL 格式轉換為 spacy 格式

[英]How to convert from CoNLL format to spacy format

我目前正在研究 NER 模型。 我有一堆以 CoNLL 格式存儲的數據,需要轉換為 Spacy 格式。 在 CoNLL 中,句子的每個單詞旁邊都有一個標簽。 在 Spacy 中,標簽僅顯示給具有實際標簽的單詞。 我如何從下面的這種格式轉換(CoNLL)

From    O
2001    B-DateTime
to  I-DateTime
2004    I-DateTime
,   O
I   O
was O
a   O
stagehand   O
for O
Hartford    B-Company
Stage   I-Company
Company O
.   O

到下面的這種格式(Spacy)

TRAIN_DATA = [('what is the price of polo?', {'entities': [(21, 25, 'PrdName')]}), 
              ('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}), 
              ('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of jeans?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of bat?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of shirt?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of bag?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of cup?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of jug?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of plate?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of glass?', {'entities': [(21, 26, 'PrdName')]}),
              ('what is the price of watch?', {'entities': [(21, 26, 'PrdName')]})]

只需使用spacy convert

spacy convert input.conll -c conll -o ./output/

請注意,默認情況下這會生成一個二進制.spacy文件。 JSON 格式在 v3 中已棄用,並沒有太大幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM