I found this solution to read word file content from a url
from urllib.request import urlopen
from bs4 import BeautifulSoup
from io import BytesIO
from zipfile import ZipFile
file = urlopen(url).read()
file = BytesIO(file)
document = ZipFile(file)
content = document.read('word/document.xml')
word_obj = BeautifulSoup(content.decode('utf-8'))
text_document = word_obj.findAll('w:t')
for t in text_document:
print(t.text)
Anyone know a similar way to process pptx files? I have seen several solutions but to read the file directly, not from a url.
我不知道它是否可以帮助您,但是使用urllib可以获取pptx(变量file
)的内容,请在读取pptx文件路径的函数中使用cStringIO.StringIO(file)
来模拟文件。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.