简体   繁体   English

从 Prodigy 的用于标记 NER 的 JSONL 格式转换为 spaCy 的训练格式?

[英]Convert from Prodigy's JSONL format for labeled NER to spaCy's training format?

I am new to Prodigy and spaCy as well as CLI coding.我是 Prodigy 和 spaCy 以及 CLI 编码的新手。 I'd like to use Prodigy to label my data for an NER model, and then use spaCy in python to create models.我想使用 Prodigy 来 label 我的 NER model 数据,然后在 python 中使用 spaCy 来创建模型。

Prodigy outputs in SQLite format. Prodigy 以 SQLite 格式输出。 SpaCy takes in this other kind of format, not sure what to call it: SpaCy 采用这种另一种格式,不知道该怎么称呼它:

TRAIN_DATA = [
    (
        "Horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("Do they bite?", {"entities": []}),
    (
        "horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
    (
        "they pretend to care about your feelings, those horses",
        {"entities": [(48, 54, LABEL)]},
    ),
    ("horses?", {"entities": [(0, 6, LABEL)]}),
]

How can I convert from one to the other?如何从一种转换为另一种? It seems like this should be easy, but I cannot find it anywhere.看起来这应该很容易,但我在任何地方都找不到。

I have no problem loading in the dataset, just converting.我在数据集中加载没有问题,只是转换。

Prodigy should export this training format with data-to-spacy as of version 1.9: https://prodi.gy/docs/recipes#data-to-spacy从 1.9 版开始,Prodigy 应使用data-to-spacy导出此训练格式: https://prodi.gy/docs/recipes#data-to-spacy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM