[英]Read images from docx file with python-docx
I have a docx file which contains images, shown as below in unzipped document.xml
format. 我有一个docx文件,其中包含图像,如下所示,以未压缩的
document.xml
格式显示。 Here, the particular images file is referred to by its id within the docx structure: rId5
. 在这里,特定的图像文件由其在docx结构中的id引用:
rId5
。
<w:p>
<w:pPr>
<w:framePr w:h="13450" w:wrap="notBeside" w:vAnchor="text" w:hAnchor="text" w:xAlign="center" w:y="1"/>
<w:widowControl w:val="0"/>
<w:jc w:val="center"/>
<w:rPr>
<w:sz w:val="2"/>
<w:szCs w:val="2"/>
</w:rPr>
</w:pPr>
<w:r>
<w:pict>
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype>
<v:shape id="_x0000_s1026" type="#_x0000_t75" style="width:486pt;height:673pt;">
<v:imagedata r:id="rId5" r:href="rId6"/>
</v:shape>
</w:pict>
</w:r>
</w:p>
I tried to use the document.inline_shapes
property to read the images, but the following prints 0: 我尝试使用
document.inline_shapes
属性读取图像,但是以下内容打印0:
PATH = "/home/amoe/test.docx"
doc = docx.Document(PATH)
print(len(doc.inline_shapes))
Is there any other way I can read this data? 我还有其他方法可以读取此数据吗? I can see that the image is contained within a 'run', but I can't see any way to use the API of the
docx.text.Run
class to access the image. 我可以看到该图像包含在“运行”中,但是我看不到使用
docx.text.Run
类的API来访问该图像的任何方法。 The id of the imagedata
element would be enough. imagedata
元素的id就足够了。
Refer to python-docx 0.8.9 documentation 请参阅 python-docx 0.8.9文档
Word documents have two layers, a text layer and a drawing layer.
Word文档有两层,文本层和图形层。 When a picture appears in the text layer it is called an inline picture.
当图片出现在文本层中时,称为嵌入式图片。 At the time of writing, python-docx only supports inline pictures.
在撰写本文时,python-docx仅支持嵌入式图片。
I assume your pictures in the drawing layer, so you can't read the pictures by python-docx. 我假设您的图片位于绘图层中,因此您无法通过python-docx读取图片。
You can read this post https://stackoverflow.com/a/27705408/8484506 您可以阅读这篇文章https://stackoverflow.com/a/27705408/8484506
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.