如何使用 Python 從 Google Doc 中提取或讀取圖像

Question

我正在嘗試從我的 google 文檔中讀取數據。 所以我現在正在使用 python 並且我已經實現了Google Docs AP I 並使用了 python。 我只是復制粘貼谷歌提供的代碼，而做了一些修改，我成功地讀取數據逐行但只有文字！ 現在我正在嘗試一些新的東西並插入了一個圖像。 這是它的樣子。

我的谷歌文檔內容的圖像

谷歌文檔鏈接

很簡單吧...它有一個項目符號和子項目符號，包含一個圖像和一個“你好”文本。 現在，當我讀出的數據（它讀取它逐行）我試着打印出什么API返回，它返回一個dictionary包含dictionaries一次。 這是它的樣子。

{'startIndex': 1, 'endIndex': 41, 'paragraph': {'elements': [{'startIndex': 1, 'endIndex': 41, 'textRun': {'content': 'This is the Python Programming Language\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 18, 'unit': 'PT'}, 'indentStart': {'magnitude': 36, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'textStyle': {'underline': False}}}}


{'startIndex': 41, 'endIndex': 43, 'paragraph': {'elements': [{'startIndex': 41, 'endIndex': 42, 'inlineObjectElement': {'inlineObjectId': 'kix.o4cuh6wash2n', 'textStyle': {}}}, {'startIndex': 42, 'endIndex': 43, 'textRun': {'content': '\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}


{'startIndex': 43, 'endIndex': 49, 'paragraph': {'elements': [{'startIndex': 43, 'endIndex': 49, 'textRun': {'content': 'Hello\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}

正如你可以看到有，有含其3個詞典key和value對。 請注意，這三個用於文檔中的每一行。 您還可以觀察到關鍵content ，其value是文檔中的文本。

如果您查看嵌套字典，則是這些字典：

{'content': 'This is the Python Programming Language\n', 'textStyle': {}}
{'content': '\n', 'textStyle': {}}
{'content': 'Hello\n', 'textStyle': {}}

現在我注意到它為圖像包含的行返回了一個\\n 。 此外，我已經尋找至少它可能有一個key ，它的值將是圖像的臨時 url，但它似乎沒有。 所以我的問題是有沒有辦法使用我正在使用的這個 API 以某種方式讀取這個圖像（也提取它）？ 可能我只是錯過了一些東西......有人可以幫我嗎？ 非常感謝任何其他替代解決方案！ 謝謝！

順便說一下，這里是谷歌提供的源代碼，我對read_strucutural_elements函數進行了修改，以了解它如何為我的個人目的讀取數據，但正如您所見，它是如何工作的，API 為每一行返回一個字典數據。 我還注意到 API 確實以某種方式逐行讀取它並返回它的dictionary

def main():
    """Shows basic usage of the Docs API.
    Prints the title of a sample document.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('docs', 'v1', credentials=creds)

    # Retrieve the documents contents from the Docs service.
    document = service.documents().get(documentId=DOCUMENT_ID).execute()

    #print('The title of the document is: {}'.format(document.get('title')))
    data = read_strucutural_elements(document.get("body").get("content"))

這是read_strucutural_elements函數，我只是在那里打印出來自elements參數的elements ，其中該參數逐行包含這些數據。

def read_strucutural_elements(elements):

    for value in elements:
        print(value) #the value of the value variable is the nested dictionaries I've shown above
        print()

非常感謝！

Answer 1

查看字典輸出，圖像是具有特定 id 的 inlineObject。 您應該能夠使用其 url 檢索圖像。 要獲取 url，請參閱相關問題： How to get the url to Google doc image

如何使用 Python 從 Google Doc 中提取或讀取圖像

問題描述

1 個解決方案

解決方案1
0 已采納 2020-10-21 10:29:45

如何使用 Python 從 Google Doc 中提取或讀取圖像

問題描述

1 個解決方案

解決方案1 0 已采納 2020-10-21 10:29:45

解決方案1
0 已采納 2020-10-21 10:29:45