如何從powerpoint（python-pptx）訪問圖像

Question

我很難嘗試使用 python-pptx 庫訪問/保存圖像。 因此，如果圖像的形狀類型為PICTURE （即shape.shape_type == MSO_SHAPE_TYPE.PICTURE ），我可以使用“blob”屬性輕松訪問/保存圖像。 這是代碼：

import argparse
import os
from PIL import Image
import pptx
from pptx.enum.shapes import MSO_SHAPE_TYPE
from pptx import Presentation
from mdutils.mdutils import MdUtils
from mdutils import Html

def main():

    parser = argparse.ArgumentParser()
    parser.add_argument('ppt_name', type=str, help='add the name of the PowerPoint file(NOTE: the folder must be in the same directory as the prorgram file')
    args = parser.parse_args()
    
    pptx_name = args.ppt_name
    pptx_name_formatted = pptx_name.split('.')[0]

    prs = Presentation(pptx_name)

    path = '{}_converted'.format(pptx_name_formatted)
    if not os.path.exists(path):
        os.mkdir(path)
    images_folder = '{}_images'.format(pptx_name_formatted)
    images_path = os.path.join(path, images_folder)
    if not os.path.exists(images_path):
        os.mkdir(images_path)

    ppt_dict = {} #Keys: slide numbers, values: slide content
    texts = []
    slide_count = 0
    picture_count = 0
    for slide in prs.slides:
        texts = []
        slide_count += 1
        
        for shape in slide.shapes:
            if shape.has_text_frame:
                if '\n' in shape.text:
                    splitted = shape.text.split('\n')
                    for word in splitted:
                        if word != '':
                            texts.append(word)
                elif shape.text == '':
                    continue
                else:
                    texts.append(shape.text)
            elif shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
                with open('{}/image{}_slide{}.png'.format(images_path, picture_count, slide_count), 'wb') as f:
                    f.write(shape.image.blob)
                    picture_count += 1
            ppt_dict[slide_count] = texts

    ppt_content = ''
    for k,v in ppt_dict.items():
        ppt_content = ppt_content + ' - Slide number {}\n'.format(k)
        for a in v:
            ppt_content = ppt_content + '\t - {}\n'.format(a)

    mdFile = MdUtils(file_name='{}/{}'.format(path,path)) #second argument isn't path, it just shares the path name.
    mdFile.write(ppt_content)
    mdFile.create_md_file()


if __name__ == "__main__":
    main()

問題是當圖片的形狀類型為 'auto shape' 時，我嘗試了很多方法但無濟於事。 當我為我知道是圖片的形狀運行以下代碼時：

         if shape.shape_type == MSO_SHAPE_TYPE.AUTO_SHAPE:
                print(shape.auto_shape_type)
                print(shape.fill.type)

#indented because it's in a for loop

它為shape.auto_shape_type輸出RECTANGLE

和shape.fill.type PICTURE

但我現在想要的是保存圖片（可能通過寫入圖像的二進制圖像字節流）。 有人可以幫忙嗎？

Answer 1

到圖像（部分，有 blob）的“鏈接”在填充定義中。 使用它你可以得到圖像。

使用shape.fill._xPr.xml打印填充定義周圍的 XML。 這將使您了解需要導航到的內容。 很有可能它看起來像"rId9"帶有一些特定的其他數字，其中“9”占位符在該示例中。 可能在諸如"blipfill"類的東西附近。 圖像被用作形狀的“填充”，所以這就是這里發生的事情。

然后使用類似slide._part獲取幻燈片部分，並使用其.related_parts "dict" 使用relation-id （類似“rId9”的字符串）查找圖像“填充”部分。

image_part = slide._part.related_parts["rId9"]

ImagePart實現在這里：
https://github.com/scanny/python-pptx/blob/master/pptx/parts/image.py#L21
它可以訪問圖像以及有關它的許多詳細信息。

您必須使用lxml調用檢索類似“rId9”的字符串，大致類似於：

rIds = shape.fill._xPr.xpath(".//@embed")
rId = rIds[0]

您需要對 XPath 進行一些研究，以根據您在前面步驟中打印出的 XML 計算出正確的表達式。 XPath 上有很多內容，包括這里的 SO，這是一個入門資源： http : //www.rpbourret.com/xml/XPathIn5.htm

如果您無法解決，請發布您打印的 XML，我們可以讓您進入下一步。

Answer 2

這是我的方法，感謝scanny 。

    for slide in prs.slides:
        slide_count += 1

        slide_parts = list(slide._part.related_parts.keys())
        for part in slide_parts:
            image_part = slide._part.related_parts[part]
            if type(image_part) == pptx.parts.image.ImagePart or pptx.opc.package.Part:
                file_startswith = image_part.blob[0:1]
                if file_startswith == b'\x89' or file_startswith == b'\xff' or file_startswith == b'\x47':
                    with open('{}/image{}_slide{}.png'.format(images_path, picture_count, slide_count), 'wb') as f:
                        f.write(image_part.blob)
                        picture_count += 1

檢查 PNG、JPEG 或 GIF 的 if 條件存在，因為pptx.opc.package.Part並不總是圖像。

實際上，我認為因為我正在檢查image_part.blob的開頭， image_part.blob我認為我不需要包括if type(image_part) == pptx.parts.image.ImagePart or pptx.opc.package.Part:

但只要它工作...

如何從powerpoint（python-pptx）訪問圖像

問題描述

2 個解決方案

解決方案1
2 已采納 2020-11-20 03:47:07

解決方案2
2 2020-11-21 23:07:49

如何從powerpoint（python-pptx）訪問圖像

問題描述

2 個解決方案

解決方案1 2 已采納 2020-11-20 03:47:07

解決方案2 2 2020-11-21 23:07:49

解決方案1
2 已采納 2020-11-20 03:47:07

解決方案2
2 2020-11-21 23:07:49