简体   繁体   English

对于sys.argv中的fi [1:]:参数列表太长了

[英]for fi in sys.argv[1:]: argument list too long

I am trying to execute a python script on all text files in a folder: 我试图在文件夹中的所有文本文件上执行python脚本:

for fi in sys.argv[1:]:

And I get the following error 我收到以下错误

-bash: /usr/bin/python: Argument list too long

The way I call this Python function is the following: 我称之为Python函数的方式如下:

python functionName.py *.txt

The folder has around 9000 files. 该文件夹有大约9000个文件。 Is there some way to run this function without having to split my data in more folders etc? 有没有办法运行此功能,而不必将我的数据拆分到更多的文件夹等? Splitting the files would not be very practical because I will have to execute the function in even more files in the future... Thanks 拆分文件不太实用,因为我将来必须在更多文件中执行该功能...谢谢

EDIT: Based on the selected correct reply and the comments of the replier (Charles Duffy), what worked for me is the following: 编辑:根据选定的正确回复和回复者(Charles Duffy)的评论,对我有用的是:

printf '%s\0' *.txt | xargs -0 python ./functionName.py

because I don't have a valid shebang.. 因为我没有有效的shebang ..

This is an OS-level problem (limit on command line length), and is conventionally solved with an OS-level (or, at least, outside-your-Python-process) solution: 这是一个操作系统级别的问题(命令行长度限制),并且通常使用操作系统级别(或至少在Python之外的过程)解决方案来解决:

find . -maxdepth 1 -type f -name '*.txt' -exec ./your-python-program '{}' +

...or... ...要么...

printf '%s\0' *.txt | xargs -0 ./your-python-program

Note that this runs your-python-program once per batch of files found, where the batch size is dependent on the number of names that can fit in ARG_MAX ; 请注意,这会在找到的每批文件中运行your-python-program ,其中批量大小取决于可以适合ARG_MAX的名称数量; see the excellent answer by Marcus Müller if this is unsuitable. 如果这不合适,请参阅MarcusMüller的优秀答案。

No. That is a kernel limitation for the length (in bytes) of a command line. 不。这是命令行长度(以字节为单位)的内核限制。

Typically, you can determine that limit by doing 通常,您可以通过执行来确定该限制

getconf ARG_MAX

which, at least for me, yields 2097152 (bytes), which means about 2MB. 哪个,至少对我来说,产生2097152(字节),这意味着大约2MB。

I recommend using python to work through a folder yourself, ie giving your python program the ability to work with directories instead of individidual files, or to read file names from a file. 我建议你自己使用python来处理文件夹,即让你的python程序能够处理目录而不是单个文件,或者从文件中读取文件名。

The former can easily be done using os.walk(...) , whereas the second option is (in my opinion) the more flexible one. 前者可以使用os.walk(...)轻松完成,而第二种选择(在我看来)更灵活。 Use the argparse module to give your python program an easy-to-use command line syntax, then add an argument of a file type (see reference documentation), and python will automatically be able to understand special filenames like - , meaning you could instead of 使用argparse模块为你的python程序提供一个易于使用的命令行语法,然后添加一个文件类型的参数(参见参考文档),python将自动能够理解像-这样的特殊文件名,这意味着你可以改为的

for fi in sys.argv[1:]

do

for fi in opts.file_to_read_filenames_from.read().split(chr(0))

which would even allow you to do something like 甚至可以让你做类似的事情

find -iname '*.txt' -type f -print0|my_python_program.py -file-to-read-filenames-from - 

Don't do it this way. 不要这样做。 Pass mask to your python script (eg call it as python functionName.py "*.txt" ) and expand it using glob ( https://docs.python.org/2/library/glob.html ). 将掩码传递给python脚本(例如将其称为python functionName.py "*.txt" )并使用glob( https://docs.python.org/2/library/glob.html )展开它。

I think about using glob module. 我想使用glob模块。 With this module you invoke your program like: 使用此模块,您可以调用以下程序:

python functionName.py "*.txt"

then shell will not expand *.txt into file names. 那么shell不会将*.txt扩展为文件名。 You Python program will receive *.txt in argumens list and you can pass it into glob.glob() : Python程序将在argumens列表中接收*.txt ,您可以将其传递给glob.glob()

for fi in glob.glob(sys.argv[1]):
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM