简体   繁体   English

使用C#Word API从Word文件读取图像而不使用剪贴板

[英]Reading images from Word files using C# Word API without using Clipboard

I've been working on an application to read images from multiple word files and store them in one single word file using Microsoft.Office.Interop.Word in C# 我一直在研究一个应用程序,该应用程序可以从多个Word文件中读取图像,并使用C#中的Microsoft.Office.Interop.Word将它们存储在一个单词文件中。

EDIT : I also need to save a copy of the images on the file system, so I need the image in a Bitmap or similar object. 编辑 :我还需要在文件系统上保存图像的副本,因此我需要在位图或类似对象中的图像。

This is my implementation so far, which works fine: 到目前为止, 是我的实现,效果很好:

        foreach (InlineShape shape in doc.InlineShapes)
        {
            shape.Range.Select();
            if (shape.Type == WdInlineShapeType.wdInlineShapePicture)
            {
                doc.ActiveWindow.Selection.Range.CopyAsPicture();
                ImageData = Clipboard.GetDataObject();
                object _ob1 = ImageData.GetData(DataFormats.Bitmap);
                bmp = (Bitmap)_ob1;
                images[i++] = bmp;
                /*
                bmp.Save("C:\\Users\\Akshay\\Pictures\\bitmaps\\test" + i.ToString() + ".bmp");
                */
            }
        }



I have: 我有:

  • Selected the images as InlineShapes 选择图像作为InlineShapes
  • Copied the shape into Clipboard 将形状复制到剪贴板
  • Stored the shape in the Clipboard in a DataObject 将形状存储在剪贴板中的DataObject
  • Extracted the shape from the DataObject in Bitmap format and stored in a Bitmap object. 提取从形状DataObjectBitmap格式,并存储在一个Bitmap对象。



I've been told to refrain from using Clipboard in Word automation and use the Word APIs instead. 有人告诉我不要在Word自动化中使用剪贴板,而应使用Word API。 I've read up on it and found an SO answer stating the same. 我已经阅读了一下,发现有一个同样的答案



I looked up many implementations of reading images from Word files on MSDN , SO etc. but could not find any without using clipboard. 我查找了许多从MSDNSO等上的Word文件读取图像的实现,但是如果不使用剪贴板就找不到任何实现。

How do I read images from Word files using the Word APIs from Microsoft.Office.Interop.Word namespace alone without using Clipboard ? 如何仅使用Microsoft.Office.Interop.Word命名空间中的Word API而不使用剪贴板从Word文件中读取图像?

Word documents in the Office Open XML file format store images in Base64. Office Open XML文件格式的Word文档将图像存储在Base64中。 So it should be possible to extract that information and convert/stream it to a file. 因此,应该有可能提取该信息并将其转换/流化为文件。 You can access the information when the document is open in the Word application using the Range.WordOpenXML property. 当您在Word应用程序中使用Range.WordOpenXML属性打开文档时,可以访问该信息。

string shapeBase64 = shape.Range.WordOpenXML;

This will return the entire Word Open XML in the flat file OPC format. 这将以平面文件OPC格式返回整个Word Open XML。 In other words, it won't contain only the picture in Base64, but the entire zip package definition as XML that surrounds it. 换句话说,它不会仅包含Base64中的图片,而是整个zip包定义(围绕它的XML)。 In my quick test, the tag the contains the actual Base64 is 在我的快速测试中,包含实际Base64的标签为

<pkg:binaryData>

That's a child element of 这是...的子元素

<pkg:part pkg:name="/word/media/image1.jpg" pkg:contentType="image/jpeg" pkg:compression="store">

Note that it would also be possible for you to get the entire document's WordOpenXML in one step: 请注意,您也可以一步获得整个文档的WordOpenXML:

document.Content.WordOpenXML

but might then need to understand the way the InlineShapes in the document body are linked to the actual information in the "media" part. 但随后可能需要了解文档主体中InlineShapes与“媒体”部分中实际信息的链接方式。

And it would be possible, of course, to work directly with the Zip Package (using the Open XML SDK, perhaps) instead of opening the document in the Word.Application. 当然,也有可能直接使用Zip包(也许使用Open XML SDK)而不是在Word.Application中打开文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM