简体   繁体   English

如何使用 Python 从 Google Doc 中提取或读取图像

[英]How to extract or read Image from Google Doc using Python

I am trying to read data from my google docs.我正在尝试从我的 google 文档中读取数据。 So I am using python right now and I have implemented the Google Docs AP I and using python.所以我现在正在使用 python 并且我已经实现了Google Docs AP I 并使用了 python。 I just copy pasted the code there provided by google and made some modifications and I successfully read the data LINE BY LINE but the TEXT ONLY !我只是复制粘贴谷歌提供的代码,而做了一些修改,我成功地读取数据逐行只有文字 Now I am trying something new and have inserted an image.现在我正在尝试一些新的东西并插入了一个图像。 Here is what it looks like.这是它的样子。

我的谷歌文档内容的图像

Google Doc Link 谷歌文档链接

Very simple right... It has there a bulletpoint and sub bulletpoints containing an image and a "Hello" text.很简单吧...它有一个项目符号和子项目符号,包含一个图像和一个“你好”文本。 Now when I read the data (it reads it line by line) I tried printing out what the API returns and it returns a dictionary containing dictionaries again.现在,当我读出的数据(它读取它逐行)我试着打印出什么API返回,它返回一个dictionary包含dictionaries一次。 Here is what it looks like.这是它的样子。

{'startIndex': 1, 'endIndex': 41, 'paragraph': {'elements': [{'startIndex': 1, 'endIndex': 41, 'textRun': {'content': 'This is the Python Programming Language\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 18, 'unit': 'PT'}, 'indentStart': {'magnitude': 36, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'textStyle': {'underline': False}}}}


{'startIndex': 41, 'endIndex': 43, 'paragraph': {'elements': [{'startIndex': 41, 'endIndex': 42, 'inlineObjectElement': {'inlineObjectId': 'kix.o4cuh6wash2n', 'textStyle': {}}}, {'startIndex': 42, 'endIndex': 43, 'textRun': {'content': '\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}


{'startIndex': 43, 'endIndex': 49, 'paragraph': {'elements': [{'startIndex': 43, 'endIndex': 49, 'textRun': {'content': 'Hello\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}

As you can see there , there are 3 dictionaries containing their key and value pairs.正如你可以看到有,有含其3个词典keyvalue对。 Take note that those three are for every line from the document.请注意,这三个用于文档中的每一行。 As you can also observe there is the key content and its value (s) are the text from the document.您还可以观察到关键content ,其value是文档中的文本。

If you look at the nested dictionaries it is these ones:如果您查看嵌套字典,则是这些字典:

{'content': 'This is the Python Programming Language\n', 'textStyle': {}}
{'content': '\n', 'textStyle': {}}
{'content': 'Hello\n', 'textStyle': {}}

Now what I've noticed is it returned a \\n for the line where the image contains.现在我注意到它为图像包含的行返回了一个\\n Also I've looked for at least it could have probably have a key and its value would be a temporary url for the image however it doesn't seem to have that.此外,我已经寻找至少它可能有一个key ,它的值将是图像的临时 url,但它似乎没有。 So my question is there a way to somehow read this image (also EXTRACT IT) using this API that I am using?所以我的问题是有没有办法使用我正在使用的这个 API 以某种方式读取这个图像(也提取它)? Probably I am just missing something out... Can someone help me with this?可能我只是错过了一些东西......有人可以帮我吗? Any other alternative solution would be very much appreciated!非常感谢任何其他替代解决方案! Thank you!谢谢!

By the way here is the source code provided by google and I have made modifications on the read_strucutural_elements function on how it would read the data for my personal purpose but there as you can see that's how it works where the API returns a dictionary for every line data.顺便说一下,这里是谷歌提供的源代码,我对read_strucutural_elements函数进行了修改,以了解它如何为我的个人目的读取数据,但正如您所见,它是如何工作的,API 为每一行返回一个字典数据。 I've also noticed that the API somehow really does read it line by line and returns a dictionary of it我还注意到 API 确实以某种方式逐行读取它并返回它的dictionary

def main():
    """Shows basic usage of the Docs API.
    Prints the title of a sample document.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('docs', 'v1', credentials=creds)

    # Retrieve the documents contents from the Docs service.
    document = service.documents().get(documentId=DOCUMENT_ID).execute()

    #print('The title of the document is: {}'.format(document.get('title')))
    data = read_strucutural_elements(document.get("body").get("content"))

Here is the read_strucutural_elements function and I just print out there the elements from the elements parameter, where that parameter contains those data line by line.这是read_strucutural_elements函数,我只是在那里打印出来自elements参数的elements ,其中该参数逐行包含这些数据。

def read_strucutural_elements(elements):

    for value in elements:
        print(value) #the value of the value variable is the nested dictionaries I've shown above
        print()

Thank you very much!非常感谢!

looking at the dictionary output, the image is a inlineObject with a specific id.查看字典输出,图像是具有特定 id 的 inlineObject。 you should be able to retrieve the image using its url.您应该能够使用其 url 检索图像。 to get the url, see related question: How to get the url to Google doc image要获取 url,请参阅相关问题: How to get the url to Google doc image

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python,如何从 Google 文档中读取纯文本? - Using Python, how can I read plain text from a Google Doc? 如何使用regex从Python的Word文档中提取问题 - How to extract questions from a word doc with Python using regex 使用python从doc文件中提取文本 - extract the text from doc file using python 如何使用 python 从图像中提取数据 - how to extract the data from image using python 如何使用python从图像中提取元数据? - How to extract metadata from an image using python? 如何使用Python从Word(doc,docx)文件中提取文本框和流程图? - How to extract textbox & flowcharts from Word( doc,docx) files using Python? 如何使用 Python 从 doc/docx 文件中提取数据 - How do I extract data from a doc/docx file using Python 如何在不丢失信息的情况下从谷歌下载图像以及如何使用 Python 使用枕头模块读取它们 - How to download image from google without loosing information and how to read them using pillow module using Python 在colab中使用python从谷歌驱动器读取图像 - read an image from google drive using python in colab 如何使用google docs api更新google doc(Python) - How to update google doc using google docs api (Python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM