简体   繁体   English

我无法通过 olefile 从 powerpoint 2003 文件中提取文本

[英]I can't extract text from powerpoint 2003 file by olefile

I can't extract text from Powerpoint 2003 ppt.我无法从 Powerpoint 2003 ppt 中提取文本。 After the following code, python shell will be 'not responding' or hang up.执行以下代码后,python shell 将“无响应”或挂断。

import olefile
ole = olefile.OleFileIO('mypowerpoint.ppt')
text = ole.openstream('PowerPoint Document')
read = text.read()
print(read)

I think this is because the stream "PowerPoint Document" contains mostly binary data.我认为这是因为“PowerPoint 文档”流主要包含二进制数据。 So you would need to process it to extract the text before printing it.因此,您需要在打印之前对其进行处理以提取文本。

Alternatively, you may use print(repr(read)) to see what it contains.或者,您可以使用 print(repr(read)) 来查看它包含的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM