简体   繁体   English

如何提高读取大文件并将其作为下载返回的python cgi的性能?

[英]How to improve performance of python cgi that reads a big file and returns it as a download?

I have this python cgi script that checks if it hasn't been accessed to many times from the same IP, and if everything is ok, reads a big file form disk (11MB) and then returns it as a download. 我有这个python cgi脚本,检查它是否从同一个IP多次访问,如果一切正常,读取一个大文件格式磁盘(11MB),然后将其作为下载返回。

It works,but performance sucks. 它有效,但性能很糟糕。 The bottleneck seems to be reading this huge file over and over: 瓶颈似乎是一遍又一遍地读取这个巨大的文件:

def download_demo():
    """
    Returns the demo file
    """

    file = open(FILENAME, 'r')
    buff = file.read()

    print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n%s" %    (os.path.split(FILENAME)[-1], len(buff), buff)

How can I make this faster? 我怎样才能让它更快? I thought of using a ram disk to keep the file, but there must be some better solution. 我想过使用ram磁盘来保存文件,但必须有一些更好的解决方案。 Would using mod_wsgi instead of a cgi script help? 使用mod_wsgi而不是cgi脚本会有帮助吗? Would I be able to keep the big file in apache's memory space? 我能将大文件保存在apache的内存空间吗?

Any help is greatly appreciated. 任何帮助是极大的赞赏。

Use mod_wsgi and use something akin to: 使用mod_wsgi并使用类似于:

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return environ['wsgi.file_wrapper'](file)

In other words, use wsgi.file_wrapper extension of WSGI standard to allow Apache/mod_wsgi to perform optimised reply of file contents using sendfile/mmap. 换句话说,使用WSGI标准的wsgi.file_wrapper扩展来允许Apache / mod_wsgi使用sendfile / mmap执行文件内容的优化回复。 In other words, avoids your application even needing to read file into memory. 换句话说,避免您的应用程序甚至需要将文件读入内存。

Why are you printing is all in one print statement? 为什么打印都在一个打印声明中? Python has to generate several temporary strings to handle the content headers and because of that last %s, it has to hold the entire contents of the file in two different string vars. Python必须生成几个临时字符串来处理内容标题,并且由于最后的%s,它必须将文件的全部内容保存在两个不同的字符串变量中。 This should be better. 这应该会更好。

print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n" %    (os.path.split(FILENAME)[-1], len(buff))
print buff

You might also consider reading the file using the raw IO module so Python doesn't create temp buffers that you aren't using. 您可能还会考虑使用原始IO模块读取文件,因此Python不会创建您不使用的临时缓冲区。

Try reading and outputting (ie buffering) a chunk of say 16KB at a time. 尝试一次读取和输出(即缓冲)一块16KB的块。 Probably Python is doing something slow behind the scenes and manually buffering may be faster. 可能Python在幕后做得很慢,手动缓冲可能会更快。

You shouldn't have to use eg a ramdisk - the OS disk cache ought to cache the file contents for you. 你不应该使用例如ramdisk - 操作系统磁盘缓存应该为你缓存文件内容。

mod_wsgi or FastCGI would help in the sense that you don't need to reload the Python interpreter every time your script is run. mod_wsgi或FastCGI有助于您每次运行脚本时都不需要重新加载Python解释器。 However, they'd do little to improve the performance of reading the file (if that's what's really your bottleneck). 但是,它们对提高读取文件的性能几乎没有作用(如果这真的是你的瓶颈)。 I'd advise you to use something along the lines of memcached instead. 我建议你使用memcached的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM