I am trying to use the Invoice parser from the Document AI API that google provides. I keep getting the below error even if I have followed all the required steps in their documentation. I have updated my python packages, installed everything accordingly, set up all the environment variables, made a service account specially for this and I keep getting the below error.
I am running this on Linux mint 21
My code
import os
import google.auth
credentials, project = google.auth.default()
print(credentials)
print(project)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = '/home/build/invoice-text-extraction/service_account.json'
LOCATION = 'us'
PROJECT_ID = 'my_project'
PROCESSOR_ID = 'akjdh2513h1j2'
FILE_PATH = 'files/Final Collection 251835416.pdf'
def quickstart(project_id: str, location: str, processor_id: str, file_path: str):
from google.cloud import documentai as documentai
# You must set the api_endpoint if you use a location other than 'us', e.g.:
opts = {}
if location == "eu":
opts = {"api_endpoint": "eu-documentai.googleapis.com"}
client = documentai.DocumentProcessorServiceClient(client_options=opts, credentials=credentials)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
document = {"content": image_content, "mime_type": "application/pdf"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
document = result.document
document_pages = document.pages
# For a full list of Document object attributes, please reference this page: https://googleapis.dev/python/documentai/latest/_modules/google/cloud/documentai_v1beta3/types/document.html#Document
# Read the text recognition output from the processor
print("The document contains the following paragraphs:")
for page in document_pages:
paragraphs = page.paragraphs
for paragraph in paragraphs:
print(paragraph)
paragraph_text = get_text(paragraph.layout, document)
print(f"Paragraph text: {paragraph_text}")
def get_text(doc_element: dict, document: dict):
"""
Document AI identifies form fields by their offsets
in document text. This function converts offsets
to text snippets.
"""
response = ""
# If a text segment spans several lines, it will
# be stored in different text segments.
for segment in doc_element.text_anchor.text_segments:
start_index = (
int(segment.start_index)
if segment in doc_element.text_anchor.text_segments
else 0
)
end_index = int(segment.end_index)
response += document.text[start_index:end_index]
return response
quickstart(PROCESSOR_ID, LOCATION, PROCESSOR_ID, FILE_PATH)
Traceback (most recent call last):
File "main.py", line 71, in <module>
quickstart(PROCESSOR_ID, LOCATION, PROCESSOR_ID, FILE_PATH)
File "main.py", line 19, in quickstart
client = documentai.DocumentProcessorServiceClient(client_options=opts)
File "/usr/local/lib/python3.8/dist-packages/google/cloud/documentai_v1/services/document_processor_service/client.py", line 364, in __init__
self._transport = Transport(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/documentai_v1/services/document_processor_service/transports/grpc.py", line 166, in __init__
self._grpc_channel = type(self).create_channel(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/documentai_v1/services/document_processor_service/transports/grpc.py", line 218, in create_channel
return grpc_helpers.create_channel(
File "/usr/local/lib/python3.8/dist-packages/google/api_core/grpc_helpers.py", line 306, in create_channel
composite_credentials = _create_composite_credentials(
File "/usr/local/lib/python3.8/dist-packages/google/api_core/grpc_helpers.py", line 236, in _create_composite_credentials
credentials = google.auth.credentials.with_scopes_if_required(
TypeError: with_scopes_if_required() got an unexpected keyword argument 'default_scopes'
I am using the code example for python they provide here .
Since this was originally posted, the Document AI API added a feature to specify a field_mask
in the processing request, which limits the fields returned in the Document object output. This can reduce the latency in some requests since the response will be a smaller size.
https://cloud.google.com/document-ai/docs/send-request#online-processor
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.