简体   繁体   中英

Document AI Contract Processor - batchProcessDocuments ignores fieldMask

My aim is to reduce the json file size, which contains the base64 image sections of the documents by default.

I am using the Document AI - Contract Processor in US region, nodejs SDK.

It is my understanding that setting fieldMask attribute in batchProcessDocuments request filters out the properties that will be in the resulting json . I want to keep only the entities property.

Here are my call parameters:

const documentai = require('@google-cloud/documentai').v1;
const client = new documentai.DocumentProcessorServiceClient(options);
let params = {
    "name": "projects/XXX/locations/us/processors/3e85a4841d13ce5",
    "region": "us",
    "inputDocuments": {
        "gcsDocuments": {
            "documents": [{
                "mimeType": "application/pdf",
                "gcsUri": "gs://bubble-bucket-XXX/files/CymbalContract.pdf"
            }]
        }
    },
    "documentOutputConfig": {
        "gcsOutputConfig": {
            "gcsUri": "gs://bubble-bucket-XXXX/ocr/"
        },
        "fieldMask": {
            "paths": [
                "entities"
            ]
        }
    }
};
client.batchProcessDocuments(params, function(error, operation) {
    if (error) {
        return reject(error);
    }
    return resolve({
        "operationName": operation.name
    });

});

However, the resulting json is still containing the full set of data . 在此处输入图像描述

Am I missing something here?

The auto-generated documentation for the Node.JS Client Library is a little hard to follow, but it looks like the fieldMask should be a member of the gcsOutputConfig instead of the documentOutputConfig . (I'm surprised the API didn't throw an error)

https://cloud.google.com/nodejs/docs/reference/documentai/latest/documentai/protos.google.cloud.documentai.v1.documentoutputconfig.gcsoutputconfig

The REST Docs are a little more clear

https://cloud.google.com/document-ai/docs/reference/rest/v1/DocumentOutputConfig#gcsoutputconfig


Note: For a REST API call and for other client libraries, the fieldMask is structured as a string (eg text,entities,pages.pageNumber )

I haven't tried this with the Node Client libraries before, but I'd recommend trying this as well if moving the parameter doesn't work on its own.

https://cloud.google.com/document-ai/docs/send-request#async-processor

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM