简体   繁体   English

使用请求模块打开的文件太多

[英]Too many open files using requests module

I am using requests module to POST several files to a server, this works fine most of the time. 我正在使用请求模块将几个文件POST到服务器,这在大多数情况下工作正常。 However, when many files are uploaded >256 I get an IOError: [Errno 24] Too many open files. 但是,当许多文件上传> 256时,我得到一个IOError:[Errno 24]打开的文件太多。 The problem happens because I build a dictionary with many files which are opened as shown in the code below. 出现问题的原因是我构建了一个包含许多文件的字典,这些文件的打开方式如下面的代码所示。 Since I do not have a handle to close these open files we see this error. 由于我没有关闭这些打开文件的句柄,因此我们看到此错误。 This leads to following questions 这导致以下问题

  1. Is there a way to close these files in chunks? 有没有办法以块的形式关闭这些文件?
  2. Does requests module close the open files automatically? 请求模块是否自动关闭打开的文件?

     url = 'http://httpbin.org/post' #dict with several files > 256 files = {'file1': open('report.xls', 'rb'), 'file2': open('report2.xls', 'rb')} r = requests.post(url, files=files) r.text 

The workaround I am using right now is files.clear() after uploading < 256 files at a time. 我现在使用的解决方法是一次上传<256个文件后的files.clear()。 I am unsure whether the files get closed doing so, but the error goes away. 我不确定文件是否关闭这样做,但错误消失了。

Please provide insight on how to handle this situation. 请提供有关如何处理这种情况的见解。 Thanks 谢谢

The simplest solution here is to read the files into memory yourself, then pass them to requests. 这里最简单的解决方案是自己将文件读入内存,然后将它们传递给请求。 Note that, as the docs say, "If you want, you can send strings to be received as files". 请注意,正如文档所说,“如果需要,您可以发送字符串作为文件接收”。 So, do that. 所以,那样做。

In other words, instead of building a dict like this: 换句话说,而不是像这样建立一个字典:

files = {tag: open(pathname, 'rb') for (tag, pathname) in stuff_to_send}

… build it like this: ......像这样构建:

def read_file(pathname):
    with open(pathname, 'rb') as f:
        return f.read()
files = {tag: read_file(pathname) for (tag, pathname) in stuff_to_send}

Now you've only got one file open at a time, guaranteed. 现在你只保证一次打开一个文件。

This may seem wasteful, but it really isn't— requests is just going to read in all of the data from all of your files if you don't.* 这可能看起来很浪费,但实际上并非 - requests只是read所有文件中的所有数据,如果不这样做的话。*

But meanwhile, let me answer your actual questions instead of just telling you what to do instead. 但同时,让我回答你的实际问题,而不是仅仅告诉你该怎么做。


Since I do not have a handle to close these open files we see this error. 由于我没有关闭这些打开文件的句柄,因此我们看到此错误。

Sure you do. 当然可以。 You have a dict, whose values are these open files. 你有一个dict,其值是这些打开的文件。

In fact, if you didn't have a handle to them, this problem would probably occur much less often, because the garbage collector would (usually, but not necessarily robustly/reliably enough to count on) take care of things for you. 事实上,如果你没有它们的句柄,这个问题可能会发生得少得多,因为垃圾收集器会(通常,但不一定非常强大/可靠地指望)为你照顾好事情。 The fact that it's never doing so implies that you must have a handle to them. 从来没有这样做的事实意味着你必须掌握它们。


Is there a way to close these files in chunks? 有没有办法以块的形式关闭这些文件?

Sure. 当然。 I don't know how you're doing the chunks, but presumably each chunk is a list of keys or something, and you're passing files = {key: files[key] for key in chunk} , right? 我不知道你是怎么做的,但是大概是每个块都是一个键或者什么东西,你传递的files = {key: files[key] for key in chunk} ,对吧?

So, after the request, do this: 所以,在请求之后,执行以下操作:

for key in chunk:
    files[key].close()

Or, if you're building a dict for each chunk like this: 或者,如果你正在为每个块构建一个dict ,如下所示:

files = {tag: open(filename, 'rb') for (tag, filename) in chunk}

… just do this: ......这样做:

for file in files.values():
    file.close()

Does requests module close the open files automatically? 请求模块是否自动关闭打开的文件?

No. You have to do it manually. 不,你必须手动完成。

In many use cases, you get away with never doing so because the files variable goes away soon after the request, and once nobody has a reference to the dict, it gets cleaned up soon (immediately, with CPython and if there are no cycles; just "soon" if either of those is not true), meaning all the files get cleaned up soon, at which point the destructor closes them for you. 在许多用例中,你永远不会这样做,因为files变量在请求后很快消失了,一旦没有人对dict有引用,它很快就会被清除(立即使用CPython并且如果没有循环;只是“很快”,如果其中任何一个不是真的),意味着很快就会清理所有文件,此时析构函数会为你关闭它们。 But you shouldn't rely on that. 但你不应该依赖它。 Always close your files explicitly. 始终明确关闭文件。

And the reason the files.clear() seems to work is that it's doing the same thing as letting files go away: it's forcing the dict to forget all the files, which removes the last reference to each of them, meaning they will get cleaned up soon, etc. 而且files.clear()似乎工作的原因是它正在做同样的事情让files消失:它迫使dict忘记所有文件,这删除了对每个文件的最后一个引用,这意味着它们将被清除很快,等等


* What if you don't have enough page space to hold them all in memory? *如果您没有足够的页面空间将它们全部保存在内存中怎么办? Then you can't send them all at once anyway. 那么你无论如何都不能一次性发送它们。 You'll have to make separate requests, or use the streaming API—which I believe means you have to do the multiparting manually as well. 您必须提出单独的请求,或使用流API - 我相信这也意味着您必须手动执行多部分。 But if you have enough page space, just not enough real RAM, so trying to read them all sends you into swap thrashing hell, you might be able to get around it by concatenating them all on-disk, opening the giant file, mmap ping segments of it, and sending those as the strings… 但是如果你有足够的页面空间,只是没有足够的真实内存,那么尝试阅读它们都会让你陷入交换震撼地狱,你或许可以通过在磁盘上连接它们来打开它,打开巨型文件, mmap ping它的一部分,并将它们作为字符串发送......

Don't forget about the power of python duck typing ! 不要忘记蟒蛇鸭打字的力量!

Just implement a wrapper class for your files: 只需为您的文件实现一个包装类:

class LazyFile(object):

    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode

    def read(self):
        with open(self.filename, self.mode) as f:
            return f.read()

url = 'http://httpbin.org/post'
#dict with a billion files
files = {'file1': LazyFile('report.xls', 'rb'), 'file2': LazyFile('report2.xls', 'rb')}
r = requests.post(url, files=files)
r.text

In this way, each file is opened read and closed one at a time as requests iterates over the dict. 通过这种方式,当requests遍历dict时,每个文件一次打开读取和关闭。

Note that while this answer and abarnert's answer basically do the same thing right now, requests may, in the future, not build the request entirely in memory and then send it, but send each file in a stream, keeping memory usage low. 请注意,虽然这个答案和abarnert的答案现在基本上做同样的事情,但是将来, requests可能不会完全在内存中构建请求然后发送它,而是在流中发送每个文件,从而保持低内存使用率。 At that point this code would be more memory efficent. 那时这段代码会更有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM