I have a third party executable called by using subprocess.check_output unfortunately my argument list is too long and repeatedly calling it is much slower than calling it once with many arguments.
Slow due to making the command call many times:
def call_third_party_slow(third_party_path, files):
for file in files:
output = subprocess.check_output([third_party_path, "-z", file])
if "sought" in decode(output):
return False
return True
Fast but fails when there are many files:
def call_third_party_fast(third_party_path, files):
command = [third_party_path, "-z"]
command.extend(files)
output = subprocess.check_output(command)
if "sought" in decode(output):
return False
return True
Is there any easy way I can work around the command length limit or easily group the files to avoid exceeding the os dependent length?
You could batch the files list like this:
def batch_args(args, arg_max):
current_arg_length = 0
current_list = []
for arg in args:
if current_arg_length + len(arg) + 1 > arg_max:
yield current_list
current_list = [arg]
current_arg_length = len(arg)
else:
current_list.append(arg)
current_arg_length += len(arg) + 1
if current_list:
yield current_list
So the method body would look like this:
os_limit = 10
for args in batch_args(files, os_limit):
command = [third_party_path, "-z"]
command.extend(args)
output = subprocess.check_output(command)
if "sought" in decode(output):
return False
return True
Two things I'm not sure about:
Adjust arg_max to what is possible. Probably there is some way of finding this out per OS. Here is some info about the max args size of some OSs. That site also states there is a 32k limit for windows.
Maybe there is a better way to do it using the subprocess library, but I'm not sure.
Also I'm not doing any exception handling (args in list longer than max size, etc.)
I solved this by using a temporary file on windows. For Linux the command could be executed as is.
Method to build the full command for the different plattforms:
import tempfile
temporary_file = 0
def make_full_command(base_command, files):
command = list(base_command)
if platform.system() == "Windows":
global temporary_file
temporary_file = tempfile.NamedTemporaryFile()
posix_files = map((lambda f: f.replace(os.sep, '/')),files)
temporary_file.write(str.encode(" ".join(posix_files)))
temporary_file.flush()
command.append("@" + temporary_file.name)
else:
command.extend(files)
return command
Usage of the file as a global variable ensures it is cleaned up after the execution.
This way I didn't have to find the max command length for different OSes
If you don't want to reinvent an optimal solution, use a tool which already implements exactly this: xargs
.
def call_third_party_slow(third_party_path, files):
result = subprocess.run(['xargs', '-r', '-0', third_party_path, '-z'],
stdin='\0'.join(files) + '\0', stdout=subprocess.PIPE,
check=True, universal_newlines=True)
if "sought" in result.stdout:
return False
return True
You'll notice I also switched to subprocess.run()
, which is available in Python 3.5+
If you do want to reimplement xargs
you will need to find the value of the kernel constant ARG_MAX
and build a command-line list whose size never exceeds this limit. Then you could check after each iteration if the output contains sought
, and quit immediately if it does.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.