Training spaCy's NER model from scratch on CoNLL 2003 data got very weird results

Question

I'm trying to try training NER models using spaCy from scratch. I wanted to first try it out on CoNLL 2003 data , which is widely used as a baseline for NER systems.

The following are the commands I ran:

spacy convert -c ner train.txt valid.txt test.txt spacyConverted
cd spacyConverted
python -m spacy train en trained train.txt.json valid.txt.json --no-tagger --no-parser
mkdir displacy
python -m spacy evaluate trained/model-final test.txt.json --displacy-path displacy

However, the evaluation results on test data are very weird and totally off, as seen in the following displacy output.

The precision, recall and f1 scores are very low both during the training and the evaluation.

I do believe that the commands are correct and in accordance with the documentation. What could be the possible problem here? Could it be that I must supply some word vectors as well? If so, how do I supply those that come by default in spaCy? Or could it be that one cannot use --no-tagger --no-parser ?

The converted .json files look like the following:

[
  {
    "id":0,
    "paragraphs":[
      {
        "sentences":[
          {
            "tokens":[
              {
                "orth":"-DOCSTART-",
                "tag":"-X-",
                "ner":"O"
              }
            ]
          },
          {
            "tokens":[
              {
                "orth":"EU",
                "tag":"NNP",
                "ner":"U-ORG"
              },
              {
                "orth":"rejects",
                "tag":"VBZ",
                "ner":"O"
              },
              {
                "orth":"German",
                "tag":"JJ",
                "ner":"U-MISC"
              },
              {
                "orth":"call",
                "tag":"NN",
                "ner":"O"
              },
              {
                "orth":"to",
                "tag":"TO",
                "ner":"O"
              },
              ...

EDIT: It seemed that I actually needed to pass in the --gold-preproc flag for the training to properly work. But I'm not sure what it actually means in this context.

Answer 1

I think you have a problem on your pre-processing. Review the pre-processing step, you have to collect a list of sentences where each line is a token a and new line separate sentences from each other.

Also, pay attention to these tokens:

-DOCSTART-

They are just separators between documents. I had that problem as well, and my results were bad. If you want have a look at how I pre-process it , to use it for other purposes not for spaCy.

Training spaCy's NER model from scratch on CoNLL 2003 data got very weird results

Question

1 answers

solution1
0 2019-03-19 21:34:01

Training spaCy's NER model from scratch on CoNLL 2003 data got very weird results

Question

1 answers

solution1 0 2019-03-19 21:34:01

solution1
0 2019-03-19 21:34:01