创建后访问bytesIO对象

Question

I am working on a scrapy spider, trying to extract text multiple pdfs in a directory, using slate ( https://pypi.python.org/pypi/slate ). 我正在研究一个拼命的蜘蛛，尝试使用slate（ https://pypi.python.org/pypi/slate ）在目录中提取多个pdf文本。 I have no interest in saving the actual PDF to disk , and so I've been advised to look into the io.bytesIO subclass at https://docs.python.org/2/library/io.html#buffered-streams . 我对将实际的PDF保存到磁盘没有兴趣，因此建议我在https://docs.python.org/2/library/io.html#buffered-streams中查看io.bytesIO子类。 Based on Creating bytesIO object , I have initialized the bytesIO class with the pdf body, but now I need to pass the data to the slate module. 基于创建bytesIO对象，我已经用pdf主体初始化了bytesIO类，但是现在我需要将数据传递到slate模块。 So far I have: 到目前为止，我有：

def save_pdf(self, response):
    in_memory_pdf = BytesIO(response.body)

    with open(in_memory_pdf, 'rb') as f:
        doc = slate.PDF(f)
        print(doc[0])

I'm getting: 我越来越：

in_memory_pdf.read(response.body)
TypeError: integer argument expected, got 'str'

How can I get this working? 我该如何工作？

edit: 编辑：

with open(in_memory_pdf, 'rb') as f:
TypeError: coercing to Unicode: need string or buffer, _io.BytesIO found

edit 2: 编辑2：

def save_pdf(self, response):
    in_memory_pdf = BytesIO(bytes(response.body))
    in_memory_pdf.seek(0)
    doc = slate.PDF(in_memory_pdf)
    print(doc)

Answer 1

You already know the answer. 您已经知道答案了。 It is clearly mentioned in the Python TypeError message and clear from the documentation: Python TypeError消息中明确提到了这一点，并从文档中明确指出：

class io.BytesIO([initial_bytes])

BytesIO accepts bytes. BytesIO接受字节。 And you are passing it contents. 您正在传递它的内容。 ie: response.body which is a string. 即：response.body这是一个字符串。

创建后访问bytesIO对象

问题描述

1 个解决方案

解决方案1
1 2016-09-30 20:24:04

创建后访问bytesIO对象

问题描述

1 个解决方案

解决方案1 1 2016-09-30 20:24:04

解决方案1
1 2016-09-30 20:24:04