简体   繁体   中英

Read pptx file content from a url

I found this solution to read word file content from a url

from urllib.request import urlopen
from bs4 import BeautifulSoup
from io import BytesIO
from zipfile import ZipFile

file = urlopen(url).read()
file = BytesIO(file)
document = ZipFile(file)
content = document.read('word/document.xml')
word_obj = BeautifulSoup(content.decode('utf-8'))
text_document = word_obj.findAll('w:t')
for t in text_document:
    print(t.text)

Anyone know a similar way to process pptx files? I have seen several solutions but to read the file directly, not from a url.

我不知道它是否可以帮助您,但是使用urllib可以获取pptx(变量file )的内容,请在读取pptx文件路径的函数中使用cStringIO.StringIO(file)来模拟文件。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM