简体   繁体   English

Bash并行处理多个输入文件,一个输出文件

[英]Bash parallel process on multiple input files, one output file

I created a for loop that goes through multiple files and outputs the results into one file: 我创建了一个遍历多个文件的for循环,并将结果输出到一个文件中:

for x in /home/moleculo/x*; do ExtractOutCalls2.sh /home/Scripts/000 $x & done

So each of my input files starts with letter x, that's x* as input. 所以我的每个输入文件都以字母x开头,即x *作为输入。 Script takes each of those input files $x and outputs to file /home/Scripts/000 脚本将每个输入文件$ x输出到文件/ home / Scripts / 000

Now I have a question: 现在我有一个问题:

if this is done on a few thousand files, is it a good way to put like this? 如果这是在几千个文件上完成的,这是一个很好的方式吗?

also if I use multiple input files, but specify one output file, will this mean, that my output will won't be appended? 如果我使用多个输入文件,但指定一个输出文件,这是否意味着我的输出将不会被追加? If not, how to do it 如果没有,怎么做

Regards, Irek 问候,Irek

Yes, your output file gets overwritten by each process. 是的,每个进程都会覆盖您的输出文件。 Make each script output to its own file, and once all the scripts are finished, concatenate the output: 使每个脚本输出到它自己的文件,一旦完成所有脚本,连接输出:

i=0
for x in /home/moleculo/x* ; do
    ExtractOutCalls2.sh /home/Scripts/000 $x > OUT.$i &
    (( i++ ))
done
wait
cat OUT.* > OUT
rm OUT.*

You have to change the script to output to standard output instead of the file, or make it accept the name of the output file to be created. 您必须更改脚本以输出到标准输出而不是文件,或使其接受要创建的输出文件的名称。

Often you can use the file - to designate stdout: 通常你可以使用该文件 - 来指定标准输出:

for x in /home/moleculo/x*; do ExtractOutCalls2.sh - $x & done

To avoid mixing output use GNU Parallel: 为避免混合输出,请使用GNU Parallel:

parallel ExtractOutCalls2.sh - {} ::: /home/moleculo/x* > output

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM