[英]Build GoldDoc with a spacy offset format to train a blank model with CLI
I'm currently doing NER with 3 Labels:我目前正在使用 3 个标签进行 NER:
I am able to train my model with python code but I want to use CLI Training which gives more flexibility.我可以用 python 代码训练我的模型,但我想使用 CLI 训练,这提供了更大的灵活性。
I have converted my data to spacy offset training format which looks like :我已将我的数据转换为 spacy offset 训练格式,如下所示:
[
["Bonjour\r\n\r\n\r\n\r\ncordialement, Thomas\r\n\r\n tel 0102030405",{"entities": [[70,79,"PHONE"],[56,61,"PER"]]}]
]
In order to use CLI to train/Evaluate my model I need to transform these data to a Gold format.为了使用 CLI 来训练/评估我的模型,我需要将这些数据转换为 Gold 格式。
I'm already aware of below methods but it needs an existing nlp to be used:我已经知道以下方法,但它需要使用现有的 nlp:
doc = nlp(text)
tags = biluo_tags_from_offsets(doc, offsets)
My Question is : How can I convert spacy offset to gold if I need to create a model with specific LABELS.我的问题是:如果我需要创建具有特定标签的模型,如何将 spacy 偏移量转换为黄金。
You only need the model here for tokenization and sentence segmentation, so it would also work to say:你只需要这里的模型进行标记化和句子分割,所以它也可以说:
from spacy.lang.en import English
nlp = English()
nlp.add_pipe(nlp.create_pipe("sentencizer"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.