从网站获取pdf文件并写入磁盘

Question

I have some code reading a URL and writing to disk. 我有一些读取URL并将其写入磁盘的代码。 Here it is - 这里是 -

    url = 'http://www.cs.purdue.edu/homes/ninghui/courses/Spring06/lectures/lecture05.pdf'
    ret = requests.get(url)
    print ret.headers
    print ret.headers['content-encoding']
    print ret.headers['content-type']

    pathToWrite = 'tmp/test.pdf'

    try:
        fd = os.open(pathToWrite, os.O_RDWR | os.O_CREAT)

        try:
            os.write(fd, ret.text)
        except Exception as e:
            print 'cannot write to file ' + pathToWrite
            raise

        try:
            os.close(fd)
        except:
            print 'cannot close file ' + pathToWrite
            raise

    except:
        print 'file cannot be opened ' + pathToWrite
        raise

With the above code I can get and write a pdf file to disk I get the following error - 使用上面的代码，我可以获取并向磁盘写入pdf文件，出现以下错误-

UnicodeEncodeError: 'charmap' codec can't encode characters in position 12-13: character maps to <undefined>

I get the same error when I use the following API - 使用以下API时出现相同的错误-

f = open(pathTowWrite, 'wb')
f.write(ret.text)

I feel like I am missing something obvious. 我觉得我缺少明显的东西。 This seems too straightforward to go wrong. 这似乎太简单了，不会出错。

Answer 1

You want to write ret.content not ret.text . 您要编写ret.content而不是ret.text 。 ret.text tries to conver the PDF to Unicode, which is probably impossible for a binary format like PDF. ret.text尝试将PDF转换为Unicode，这对于像PDF这样的二进制格式来说可能是不可能的。

Also, you can just use the builtin open function. 另外，您可以只使用内置的打开功能。 No need for the low-level os.open here. 无需此处的低级os.open 。

从网站获取pdf文件并写入磁盘

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-08-18 23:17:44

从网站获取pdf文件并写入磁盘

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-08-18 23:17:44

解决方案1
1 已采纳 2013-08-18 23:17:44