简体   繁体   English

在许多文件上运行python脚本

[英]running a python script on many files

I have a set of files aaa_cntrl.txt , bbb_cntrl.txt ,.... zzz_cntrl.txt . 我有一组文件aaa_cntrl.txtbbb_cntrl.txt ,... zzz_cntrl.txt I want to run a python script script.py on each of these files and produce output aaa_out.txt , bbb_out.txt ,.... zzz_out.txt . 我想在这些文件中的每一个上运行python脚本script.py并产生输出aaa_out.txtbbb_out.txt ,.... zzz_out.txt

my python script is 我的python脚本是

import sys
file_in = sys.argv[0]
file_out = sys.argv[1]
print "This is the input file", file_in
print "This is the output file", file_out

Command line is python script.py aaa_cntrl.txt aaa_out.txt 命令行是python script.py aaa_cntrl.txt aaa_out.txt

But I want to automatically specifiy the input as *_cntrl.txt and get the output as *_out.txt . 但是我想自动将输入指定为*_cntrl.txt并获取输出为*_out.txt How do I do this? 我该怎么做呢?

You need to loop over all input files and determine the output name inside the script then as your shell (at least on linux / unix) will expand the wildcard for you. 您需要遍历所有输入文件并确定脚本内的输出名称,然后您的外壳程序(至少在linux / unix上)将为您扩展通配符。

import sys

for file_in in sys.argv[1:]:
    # probably some more reliable way is required here in production
    file_out = file_in.split('_', 1)[0] + '_out.txt'

    print "This is the input file", file_in
    print "This is the output file", file_out

I've just done that a couple of day ago using argparse . 我只是在argparse使用argparse

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('files', nargs='+')
args = parser.parse_args()

for f in args.files:
    process(f)

Then, just have to call your script with ./myscript.py *_cntrl.txt , and your shell will perform the expansion, as pointed out by Willem Van Onsem in the comments. 然后,只需使用./myscript.py *_cntrl.txt调用脚本,您的外壳程序将执行扩展,如Willem Van Onsem在评论中指出的那样。
You can also take a look at the argparse.FileType to improve this code. 您也可以查看argparse.FileType来改进此代码。

If your files names are well formated, I suggest you to automatically create the name of your output file depending on the input file. 如果文件名格式正确,建议您根据输入文件自动创建输出文件的名称。 I mean, if xxx_cntrl.txt is always transformed into xxx_out.txt , you can simply do file_out = file_in.replace("cntrl", "out") . 我的意思是,如果xxx_cntrl.txt始终转换为xxx_out.txt ,则只需执行file_out = file_in.replace("cntrl", "out")

Otherwise, you can do something like 否则,您可以执行类似

parser.add_argument('files', nargs='+')
parser.add_argument('-o', nargs='+')
args = parser.parse_args()

for in_file, out_file in zip(args.file, args.o):
    process(in_file, out_file)

Anyway, I really recommend you the argparse module instead of a manual parsing of sys.argv . 无论如何,我确实建议您使用argparse模块,而不是手动解析sys.argv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM