[英]Python olefile to read text from PPT files
I'm just getting some binary data with the current code i have which uses olefile to extract text from a ppt file 我正在使用当前代码获取一些二进制数据,该代码使用olefile从ppt文件中提取文本
import olefile
ole = olefile.OleFileIO(r'C:\sampleppt.ppt')
print(ole.listdir())
data = ole.openstream('PowerPoint Document').read()
print(data)
ole.close()
How do I use olefile properly to extract the text from ppt files? 如何正确使用olefile从ppt文件中提取文本?
For MacOS Homebrew users: install Apache Tika ( brew install tika
) I think it also supports other OSs. 对于MacOS Homebrew用户:安装Apache Tika(
brew install tika
)我认为它也支持其他操作系统。
The command-line interface works like this: 命令行界面的工作方式如下:
tika --text something.ppt > something.txt
And to use it inside python script: 并在python脚本中使用它:
import os
os.system("tika --text temp.ppt > temp.txt")
You will be able to do it and that is the only solution I have so far. 您将能够做到,这是我到目前为止唯一的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.