Too many open files using requests module

Question

I am using requests module to POST several files to a server, this works fine most of the time. However, when many files are uploaded >256 I get an IOError: [Errno 24] Too many open files. The problem happens because I build a dictionary with many files which are opened as shown in the code below. Since I do not have a handle to close these open files we see this error. This leads to following questions

Is there a way to close these files in chunks?

Does requests module close the open files automatically?

 url = 'http://httpbin.org/post' #dict with several files > 256 files = {'file1': open('report.xls', 'rb'), 'file2': open('report2.xls', 'rb')} r = requests.post(url, files=files) r.text

The workaround I am using right now is files.clear() after uploading < 256 files at a time. I am unsure whether the files get closed doing so, but the error goes away.

Please provide insight on how to handle this situation. Thanks

Answer 1

The simplest solution here is to read the files into memory yourself, then pass them to requests. Note that, as the docs say, "If you want, you can send strings to be received as files". So, do that.

In other words, instead of building a dict like this:

files = {tag: open(pathname, 'rb') for (tag, pathname) in stuff_to_send}

… build it like this:

def read_file(pathname):
    with open(pathname, 'rb') as f:
        return f.read()
files = {tag: read_file(pathname) for (tag, pathname) in stuff_to_send}

Now you've only got one file open at a time, guaranteed.

This may seem wasteful, but it really isn't— requests is just going to read in all of the data from all of your files if you don't.*

But meanwhile, let me answer your actual questions instead of just telling you what to do instead.

Since I do not have a handle to close these open files we see this error.

Sure you do. You have a dict, whose values are these open files.

In fact, if you didn't have a handle to them, this problem would probably occur much less often, because the garbage collector would (usually, but not necessarily robustly/reliably enough to count on) take care of things for you. The fact that it's never doing so implies that you must have a handle to them.

Is there a way to close these files in chunks?

Sure. I don't know how you're doing the chunks, but presumably each chunk is a list of keys or something, and you're passing files = {key: files[key] for key in chunk} , right?

So, after the request, do this:

for key in chunk:
    files[key].close()

Or, if you're building a dict for each chunk like this:

files = {tag: open(filename, 'rb') for (tag, filename) in chunk}

… just do this:

for file in files.values():
    file.close()

Does requests module close the open files automatically?

No. You have to do it manually.

In many use cases, you get away with never doing so because the files variable goes away soon after the request, and once nobody has a reference to the dict, it gets cleaned up soon (immediately, with CPython and if there are no cycles; just "soon" if either of those is not true), meaning all the files get cleaned up soon, at which point the destructor closes them for you. But you shouldn't rely on that. Always close your files explicitly.

And the reason the files.clear() seems to work is that it's doing the same thing as letting files go away: it's forcing the dict to forget all the files, which removes the last reference to each of them, meaning they will get cleaned up soon, etc.

* What if you don't have enough page space to hold them all in memory? Then you can't send them all at once anyway. You'll have to make separate requests, or use the streaming API—which I believe means you have to do the multiparting manually as well. But if you have enough page space, just not enough real RAM, so trying to read them all sends you into swap thrashing hell, you might be able to get around it by concatenating them all on-disk, opening the giant file, mmap ping segments of it, and sending those as the strings…

Answer 2

Don't forget about the power of python duck typing !

Just implement a wrapper class for your files:

class LazyFile(object):

    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode

    def read(self):
        with open(self.filename, self.mode) as f:
            return f.read()

url = 'http://httpbin.org/post'
#dict with a billion files
files = {'file1': LazyFile('report.xls', 'rb'), 'file2': LazyFile('report2.xls', 'rb')}
r = requests.post(url, files=files)
r.text

In this way, each file is opened read and closed one at a time as requests iterates over the dict.

Note that while this answer and abarnert's answer basically do the same thing right now, requests may, in the future, not build the request entirely in memory and then send it, but send each file in a stream, keeping memory usage low. At that point this code would be more memory efficent.

Too many open files using requests module

Question

2 answers

solution1
4 ACCPTED 2013-11-27 23:45:31

solution2
4 2013-11-28 00:03:39

Too many open files using requests module

Question

2 answers

solution1 4 ACCPTED 2013-11-27 23:45:31

solution2 4 2013-11-28 00:03:39

solution1
4 ACCPTED 2013-11-27 23:45:31

solution2
4 2013-11-28 00:03:39