简体   繁体   English

将 Prodigy JSONL / Spacy Doc 格式转换为 CONLL

[英]Convert Prodigy JSONL / Spacy Doc format to CONLL

I have been searching for a while now but haven't found any solution to my problem.我已经搜索了一段时间,但没有找到解决我问题的方法。 For a relation classification task I have annotated several news like text documents with prodigy annotation software.对于关系分类任务,我已经使用 prodigy 注释软件注释了几条新闻,例如文本文档。 Prodigy outputs the format in a JSONL file that can be converted into a.spacy file. Prodigy 在 JSONL 文件中输出格式,可以将其转换为 .spacy 文件。 In the JSONL format, each line represents one news article with its annotations.在 JSONL 格式中,每一行代表一篇带有注释的新闻文章。

Now I want to convert my annotations into a more standardized format like CONLL, so that I can work with my annotations with other open source software like Inception (Unfortunatly Prodigy has not been a good choice).现在我想将我的注释转换为更标准化的格式,如 CONLL,以便我可以将我的注释与其他开源软件(如 Inception)一起使用(不幸的是 Prodigy 并不是一个好的选择)。 Unfortunatly, I haven't found any lib, script or tool that can convert prodigy Jsonl/Spacy to CONLL.不幸的是,我还没有找到任何可以将 prodigy Jsonl/Spacy 转换为 CONLL 的库、脚本或工具。

Here is an example, how the prodigy JSONL format looks like:这是一个例子,神童 JSONL 格式的样子:

{
  "text": "My mother’s name is Sasha Smith. She likes dogs and pedigree cats.",
  "tokens": [
    {"text": "My", "start": 0, "end": 2, "id": 0, "ws": true},
    {"text": "mother", "start": 3, "end": 9, "id": 1, "ws": false},
    {"text": "’s", "start": 9, "end": 11, "id": 2, "ws": true},
    {"text": "name", "start": 12, "end": 16, "id": 3, "ws": true },
    {"text": "is", "start": 17, "end": 19, "id": 4, "ws": true },
    {"text": "Sasha", "start": 20, "end": 25, "id": 5, "ws": true},
    {"text": "Smith", "start": 26, "end": 31, "id": 6, "ws": true},
    {"text": ".", "start": 31, "end": 32, "id": 7, "ws": true, "disabled": true},
    {"text": "She", "start": 33, "end": 36, "id": 8, "ws": true},
    {"text": "likes", "start": 37, "end": 42, "id": 9, "ws": true},
    {"text": "dogs", "start": 43, "end": 47, "id": 10, "ws": true},
    {"text": "and", "start": 48, "end": 51, "id": 11, "ws": true, "disabled": true},
    {"text": "pedigree", "start": 52, "end": 60, "id": 12, "ws": true},
    {"text": "cats", "start": 61, "end": 65, "id": 13, "ws": true},
    {"text": ".", "start": 65, "end": 66, "id": 14, "ws": false, "disabled": true}
  ],
  "spans": [
    {"start": 20, "end": 31, "token_start": 5, "token_end": 6, "label": "PERSON"},
    {"start": 43, "end": 47, "token_start": 10, "token_end": 10, "label": "NP"},
    {"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
  ],
  "relations": [
    {
      "head": 0,
      "child": 1,
      "label": "POSS",
      "head_span": {"start": 0, "end": 2, "token_start": 0, "token_end": 0, "label": null},
      "child_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null}
    },
    {
      "head": 1,
      "child": 8,
      "label": "COREF",
      "head_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null},
      "child_span": {"start": 33, "end": 36, "token_start": 8, "token_end": 8, "label": null}
    },
    {
      "head": 9,
      "child": 13,
      "label": "OBJECT",
      "head_span": {"start": 37, "end": 42, "token_start": 9, "token_end": 9, "label": null},
      "child_span": {"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
    }
  ]
}

Thanks in advance提前致谢

I want to to convert either the prodigy jsonl into CONLL or the.spacy annotation file into conll我想将 prodigy jsonl 转换为 CONLL 或将 .spacy 注释文件转换为 conll

You can load in your spaCy Docs from the .spacy file and use spacy-conll to dump them as CoNLL files.您可以从.spacy文件加载 spaCy 文档,并使用spacy-conll将它们转储为 CoNLL 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM