简体   繁体   English

Python将输出作为输入之一循环遍历一个进程

[英]Python looping over a process with the output as one of the inputs

I have a python process which takes in two input files and writes out one output file: 我有一个python进程,它接收两个输入文件并写出一个输出文件:

def process(fin1,fin2,fout):
    outf = fileout(fout)
    x = readcsv(fin1)
    y = readcsv(fin2)
    hlen = writeheaders(fin1,fin2,outf)
    sub(matchingfields(x,y),x,y,hlen,outf)
    outf.close()

I can run it easily enough like this, (defining the name of output file): 我可以这样轻松地运行它(定义输出文件的名称):

process('csv1a.csv','csv2b.csv','OUTv1.csv')

I want to input more than two files (drag and drop onto a batch file or use the cmd): 我想输入两个以上的文件(拖放到批处理文件或使用cmd):

filenames = sys.argv[1:]

So the issue is how can I easily (recursively?) loop though my process so that: 因此,问题在于我如何轻松地(递归地)遍历我的过程,以便:

run process on filenames[0] with filenames[1] 使用文件名[1]对文件名[0]运行进程

run process on output of (filenames[0] + filenames[1]) with filenames[2] 在具有文件名[2]的(文件名[0] +文件名[1])的输出上运行过程

run process on output of (filenames[0] + filenames[1] + filenames[2]) with filenames[3] 在具有文件名[3]的(文件名[0] +文件名[1] +文件名[2])的输出上运行过程

etc.. 等等..

I am quite new to programming and cant quite figure out the best way to approach this problem. 我对编程很陌生,也无法找出解决此问题的最佳方法。 Thank in advance !! 预先感谢 !!

You already found sys.argv you can use len(sys.argv)-1 to find out how many files were passed to your script. 您已经找到sys.argv ,可以使用len(sys.argv)-1找出传递给脚本的文件数量。 In general len() gives you the length of the array. 通常,len()会为您提供数组的长度。

You would then create a for loop, which loops over all passed file inputs, starting at the third, except for the last one, as this is the output file. 然后,您将创建一个for循环,该循环遍历所有传递的文件输入,从最后一个开始,从第三个开始,因为这是输出文件。 For each input file you use the existing output file and the new file of this iteration and run it with your existing function and write the results to the output file. 对于每个输入文件,您将使用现有输出文件和此迭代的新文件,并使用现有功能运行它,并将结果写入输出文件。

In the beginning you should run your existing method with input file 1 and 2 and the output file as output. 首先,您应该使用输入文件1和2以及输出文件作为输出来运行现有方法。

For drag-and-drop, create a batch file that contains 对于拖放操作,创建一个包含以下内容的批处理文件:

@echo off
python myscript.py %*

and save it as myscript.bat . 并将其另存为myscript.bat You can now drag-and-drop files on it and it will pass all the filenames as arguments to your script. 您现在可以在其上拖放文件,它将所有文件名作为参数传递给脚本。

Rather than chunking all your data back and forth through files repeatedly, I suggest processing it in memory and then writing the final result to disk. 建议您不要在文件中反复来回地分块所有数据,而是建议在内存中对其进行处理,然后将最终结果写入磁盘。

import sys

def read_csv(fname):
    # read csv file into memory
    # YOUR CODE GOES HERE
    return data

def write_csv(data, fname):
    # write data out to file
    # YOUR CODE GOES HERE

def merge(data1, data2):
    # YOUR CODE GOES HERE
    return merged_data

if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: python {} file1.csv file2.csv [...]".format(__name__))
    else:
        data = read_csv(sys.argv[1])
        for fname in sys.argv[2:]:
            more_data = read_csv(fname)
            data = merge(data, more_data)
        write_csv(data, "final.csv")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM