简体   繁体   中英

Google Document AI training fails due to an error that is already addressed

I am training a model using Google's Document AI . The training fails with the following error (I have included only a part of the JSON file for simplicity but the error is identical for all documents in my dataset):

"trainingDatasetValidation": {
      "documentErrors": [
        {
          "code": 3,
          "message": "Invalid document.",
          "details": [
            {
              "@type": "type.googleapis.com/google.rpc.ErrorInfo",
              "reason": "INVALID_DOCUMENT",
              "domain": "documentai.googleapis.com",
              "metadata": {
                "num_fields": "0",
                "num_fields_needed": "1",
                "document": "5e88c5e4cc05ddb8.json",
                "annotation_name": "INCOME_ADJUSTMENTS",
                "field_name": "entities.text_anchor.text_segments"
              }
            }
          ]
        }

What I understand from this error is that the model expects the field INCOME_ADJUSTMENTS to appear (at least) once in the document but instead, it finds zero instances of it.

That would have been understandable except I have already defined the field INCOME_ADJUSTMENTS in my schema as "Optional Once" , ie, this field can appear either zero or one time.

模式选项列表

Am I missing something? Why does this error persist despite the fact that it is addressed in the schema?

ps I have also tried "Optional multiple" (and "Required once" and "Required multiple") and the error persists.

EDIT: As requested, here's what one of the JSON files looks like . Note that there is no PII here as the details (name, SSN, etc.) are synthetic data.

I have/had the same issue as you in the past and also having it right now.

What I managed to do was to get the document string from the error message and then searching for the images in the Storage bucket that has the dataset.

Then I opened the image and searched for that image in my 1000+ images dataset.

Then I deleted the bounding box for the label with the issue and then relabeled it. This seemed to solve 90%of the issues I had.

It`sa ton of manual work and I wish google thought of more when they released the Web app for Doc AI because the ML part is great but the app is really lackluster.

I would also be very happy for any other fixes

i had the same problem. so i deleted all my dataset and imported and re-labeled again. then the training worked fine.

I had this problem with "internal error" when I had bounding boxes that intersected each other. Check your definitions and remove any labels that have boxes crossing each other. The error does not give any hints to what document has the problem, unfortunately, so you might have to scroll through them all.

Also, I had at some bounding boxes on empty fields. I do not know if this affected the error, but I also removed then along with the intersecting boxes.

After this, I could run the training process without errors.

This bug may have been fixed. I am now seeing an error message "Cannot create labels with empty values" on images that previously generated the error.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM