使用 spacy offset 格式构建 GoldDoc 以使用 CLI 训练空白模型

Question

I'm currently doing NER with 3 Labels:我目前正在使用 3 个标签进行 NER：

PERSON人
PHONE电话
ADDRESS地址

I am able to train my model with python code but I want to use CLI Training which gives more flexibility.我可以用 python 代码训练我的模型，但我想使用 CLI 训练，这提供了更大的灵活性。

I have converted my data to spacy offset training format which looks like :我已将我的数据转换为 spacy offset 训练格式，如下所示：

[
    ["Bonjour\r\n\r\n\r\n\r\ncordialement, Thomas\r\n\r\n tel 0102030405",{"entities": [[70,79,"PHONE"],[56,61,"PER"]]}]
]

In order to use CLI to train/Evaluate my model I need to transform these data to a Gold format.为了使用 CLI 来训练/评估我的模型，我需要将这些数据转换为 Gold 格式。

I'm already aware of below methods but it needs an existing nlp to be used:我已经知道以下方法，但它需要使用现有的 nlp：

doc = nlp(text)
tags = biluo_tags_from_offsets(doc, offsets)

My Question is : How can I convert spacy offset to gold if I need to create a model with specific LABELS.我的问题是：如果我需要创建具有特定标签的模型，如何将 spacy 偏移量转换为黄金。

Answer 1

You only need the model here for tokenization and sentence segmentation, so it would also work to say:你只需要这里的模型进行标记化和句子分割，所以它也可以说：

from spacy.lang.en import English
nlp = English()
nlp.add_pipe(nlp.create_pipe("sentencizer"))

使用 spacy offset 格式构建 GoldDoc 以使用 CLI 训练空白模型

问题描述

1 个解决方案

解决方案1
0 2019-12-09 12:51:40

使用 spacy offset 格式构建 GoldDoc 以使用 CLI 训练空白模型

问题描述

1 个解决方案

解决方案1 0 2019-12-09 12:51:40

解决方案1
0 2019-12-09 12:51:40