从python中的pdf文件对象中提取文本

Question

can we extract text from pdf file object collected from request for example 我们可以从例如从请求中收集的pdf文件对象中提取文本吗

f = request.FILES.get('file', None)

So from f can we extract text of the document as we get text content from text file object. 因此，当我们从文本文件对象获取文本内容时，可以从f中提取文档的文本。

Answer 1

Try using this library called textract 尝试使用名为textract的库

It supports a lot of formats including PDF 它支持多种格式，包括PDF

import textract
text = textract.process("path/to/file.extension")