简体   繁体   English

如何提高python的导入速度?

[英]How to improve python import speed?

This question has been asked many times on SO (for instance here ), but there is no real answer yet. 在SO上已经多次问过这个问题(例如here ),但是还没有真正的答案。

I am writing a short command line tool that renders templates. 我正在写一个简短的命令行工具来渲染模板。 It is frigged using a Makefile: 它是使用Makefile固定的:

i = $(wildcard *.in)
o = $(patsubst %.in, %.out, $(t))

all: $(o)

%.out: %.in
    ./script.py -o $@ $<

In this dummy example, the Makefile parses every .in file to generate an .out file. 在此虚拟示例中,Makefile解析每个.in文件以生成.out文件。 It is very convenient for me to use make because I have a lot of other actions to trig before and after this script. 对我来说,使用make非常方便,因为在此脚本前后,我还有许多其他操作要触发。 Moreover I would like to remain as KISS as possible. 此外,我想保持尽可能的

Thus, I want to keep my tool simple, stupid and process each file separately using the syntax script -o out in 因此,我想使我的工具保持简单,愚蠢并使用语法script -o out in分别处理每个文件script -o out in

My script uses the following: 我的脚本使用以下内容:

#!/usr/bin/env python
from jinja2 import Template, nodes
from jinja2.ext import Extension
import hiyapyco
import argparse
import re   

...

The problem is that each execution costs me about 1.2s ( ~60ms for the processing and ~1140ms for the import directives): 问题在于每次执行要花费我大约1.2秒的时间(大约60毫秒用于处理,大约1140毫秒用于import指令):

$ time ./script.py -o foo.out foo.in
real    0m1.625s
user    0m0.452s
sys     0m1.185s

The overall execution of my Makefile for 100 files is ridiculous: ~100 files x 1.2s = 120s. 我的100个文件的Makefile的整体执行是荒谬的:〜100个文件x 1.2s = 120s。

This is not a solution, but this should be the solution. 这不是解决方案,但应该是解决方案。

What alternative can I use? 我可以使用什么替代方法?

EDIT 编辑

I love Python because its syntax is readable and size of its community. 我喜欢Python,因为它的语法易读并且具有社区规模。 In this particular case (command line tools), I have to admit Perl is still a decent alternative. 在这种特殊情况下(命令行工具),我不得不承认Perl仍然是不错的选择。 The same script written in Perl (which is also an interpreted language) is about 12 times faster (using Text::Xslate ). 用Perl(也是一种解释语言)编写的相同脚本快了大约12倍(使用Text::Xslate )。

I don't want to promote Perl in anyway I am just trying to solve my biggest issue with Python: it is not yet a suitable language for simple command line tools because of the poor import time. 无论如何,我都不想推广Perl,我只是想解决Python的最大问题:由于导入时间很短,它还不适合用于简单的命令行工具。

It is not quite easy, but you could turn your program into one that sits in the background and processes commands to process a file. 这并不是一件容易的事,但是您可以将程序变成一个位于后台并处理命令以处理文件的程序。

Another program could feed the processing commands to it and thus make the real start quite easy. 另一个程序可以向其提供处理命令,从而使真正的开始变得非常容易。

Write the template part as a separate process. 将模板部分作为一个单独的过程编写。 The first time "script.py" is run it would launch this separate process. 第一次运行“ script.py”将启动该单独的过程。 Once the process exists it can be passed the input/output filenames via a named pipe. 该过程一旦存在,便可以通过命名管道将输入/输出文件名传递给它。 If the process gets no input for x seconds, it automatically exits. 如果该进程在x秒钟内没有任何输入,它将自动退出。 How big x is depends on what your needs are x的大小取决于您的需求

So the parameters are passed to the long running process via the script.py writing to a named pipe. 因此,通过script.py写入命名管道将参数传递给长时间运行的过程。 The imports only occur once (provided the inputs are fairly often) and as BPL points out this would make everything run faster 导入仅发生一次(假设输入相当频繁),并且正如BPL指出的那样,这将使一切运行得更快

You could use glob to perform that actions with the files you need. 您可以使用glob对所需的文件执行该操作。

import glob
in_files=glob.glob('*.in') 
out_files=glob.glob('*.out') 

Thus, you process all the files in the same script, instead of calling the script every time with every pair of files. 因此,您可以在同一脚本中处理所有文件,而不是每次对每对文件都调用脚本。 At least that way you don't have to start python every time. 至少这样,您不必每次都启动python。

It seems quite clear where the problem is, right now you got: 问题出在哪里似乎很清楚,现在您得到了:

cost(file) = 1.2s = 60ms + 1040ms , which means: cost(file) = 1.2s = 60ms + 1040ms ,这意味着:

cost(N*files) = N*1.2s

now, why don't you change it to become: 现在,为什么不将其更改为:

cost1(files) = 1040ms + N*60ms

that way, theorically processing 100 files would be 7,04s instead 120s 这样,理论上处理100个文件将是7,04s而不是120s

EDIT: 编辑:

Because I'm receiving downvotes to this question, I'll post a little example, let's assume you got this python file: 因为我对此问题不满意,所以我举一个小例子,假设您有这个python文件:

# foo.py
import numpy
import cv2

print sys.argv[0]

The execution time is 1.3s on my box, now, if i do: 现在,在我的盒子上,执行时间为1.3秒:

for /l %x in (1, 1, 100) do python foo.py

I'll get 100*1.3s execution time, my proposal was turn foo.py into this: 我将获得100 * 1.3s的执行时间,我的建议是将foo.py变成这样:

import numpy
import cv2

def whatever_rendering_you_want_to_do(file):
    pass

for file in sys.argv:
    whatever_rendering_you_want_to_do(file)

That way you're importing only once instead of 100 times 这样一来,您只需导入一次,而不是100次

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM