如何在r中使用media_extract從word中提取圖像？

Question

我在 rmarkdown 中工作以生成一個報告，該報告提取並顯示從 word 中提取的圖像。

為此，我使用了官員包。 它有一個名為 media_extract 的函數，它可以“從 rdocx 或 rpptx 對象中提取文件”。

一句話，我很難找到沒有 media_path 列的圖像。

media_path 用作 media_extract 函數中的參數來定位圖像。 請參閱下面的包文檔中的示例代碼：

example_pptx <- system.file(package = "officer",
  "doc_examples/example.pptx")
doc <- read_pptx(example_pptx)
content <- pptx_summary(doc)
image_row <- content[content$content_type %in% "image", ]
media_file <- image_row$media_file
png_file <- tempfile(fileext = ".png")
media_extract(doc, path = media_file, target = png_file)

文件路徑是使用任一生成的； docx_summary 或 pptx_summary，取決於文件類型，它們創建文件的數據框摘要。 pptx_summary 包含一個列 media_path，它顯示圖像的文件路徑。 docx_summary 數據框不包含此列。 另一個stackoverflow帖子使用 word/media/ subdir 提出了一個解決方案，這似乎有效，但是我不確定這意味着什么或如何使用它？

如何從 word 文檔中提取圖像，使用 word/media/ subdir 作為媒體路徑？

Answer 1

我繼續研究這個並找到了答案，所以我想我會分享！

我從 docx 中提取圖像的困難是由於摘要數據框中沒有media_file列（使用docx_summary ），用於定位所需的圖像。 此列存在於為 pptx pptx_summary生成的數據框中，並在包文檔中的示例代碼中使用。

如果沒有此列，您需要使用文檔子目錄（當 docx 為 XML 格式時的文件路徑）定位圖像，如下所示： media_path <- "/word/media/image3.png"

如果您想查看此結構的外觀，您可以右鍵單擊您的文檔 >7-Zip>Extract files.. 將創建一個包含文檔內容的文件夾，否則只需更改圖像編號以選擇所需的圖像。 注意：有時圖像的名稱不遵循 image.png 格式，因此您可能需要提取文件以找到所需圖像的名稱。

使用帶有 docx 的 media_extract 的示例。

#extracting image from word doc using officer package 

report <- read_docx("/Users/user.name/Documents/mydoc.docx")

png_file <- tempfile(fileext = ".png")

media_file <- "/word/media/image3.png"

media_extract(report, path = media_file, target = png_file)

您正在尋找的輸出是TRUE 。 然后可以使用knitr （或其他方法）將圖像包含在報告中。

include_graphics(png_file)

如何在r中使用media_extract從word中提取圖像？

問題描述

1 個解決方案

解決方案1
0 2021-10-19 14:06:32

如何在r中使用media_extract從word中提取圖像？

問題描述

1 個解決方案

解決方案1 0 2021-10-19 14:06:32

解決方案1
0 2021-10-19 14:06:32