[英]How to speed up a bash script?
I have a very large tab separated text file which i am parsing to obtain certain data.我有一个非常大的制表符分隔文本文件,我正在解析它以获得某些数据。 Since the input file is very large the script is very slow, how can i speed up?由于输入文件很大,脚本很慢,我怎样才能加快速度?
I tried using &
and wait
which result is bit slower and using nice
(checked using time
)我尝试使用&
并wait
结果有点慢并且使用nice
(使用time
检查)
Update Few lines of input.tsv
更新几行input.tsv
Names Number Cylinder torque HP cc others
chevrolet 18 8 307 130 3504 SLR=0.1;MIN=5;MAX=19;PR=0.008;SUM=27;SD=0.5;IQR=9.5;RANG=7.5;MP_R=0.0177;MX_R=9.118
buick 15 8 350 165 3693 SLR=0.7;MIN=7;MAX=17;PR=0.07;SUM=30;SD=2.5;IQR=7.5;RANG=9.5;MP_R=0.0197;MX_R=9.1541
satellite 18 8 318 150 3436 SLR=0.12;MIN=2;MAX=11;PR=0.065;SUM=17;SD=5.5;IQR=11.5;RANG=6.5;MP_R=0.0377;MX_R=9.154
rebel 16 8 304 150 3433 SLR=0.61;MIN=8;MAX=15;PR=0.04148;SUM=24;SD=4.5;IQR=12.5;RANG=9.5;MP_R=0.018;MX_R=9.186
torino 17 8 302 140 3449 SLR=0.2;MIN=4;MAX=14;PR=0.018;SUM=22;SD=1.5;IQR=7.5;RANG=5.5;MP_R=0.0141;MX_R=9.115
Thank you谢谢
extract.sh提取.sh
#!/bin/bash
zcat input.tsv.gz | while read a b c d e f g;
do
m=$(echo $g | awk -v key="MAX" -v RS=';' -F'=' '$1==key{print$2}')
n=$(echo $g | awk -v key="MIN" -v RS=';' -F'=' '$1==key{print$2}')
o=$(echo $g | awk -v key="SUM" -v RS=';' -F'=' '$1==key{print$2}')
p=$(echo $g | awk -v key="SD" -v RS=';' -F'=' '$1==key{print$2}')
q=$(echo $g | awk -v key="IQR" -v RS=';' -F'=' '$1==key{print$2}')
r=$(echo $g | awk -v key="RANG" -v RS=';' -F'=' '$1==key{print$2}')
data=$(printf "$a\t$b\t$c\t$d\t$e\t$f\tMAX=$m\tMIN=$n\tSUM=$o\tSD=$p\tIQR=$q\tRANG=$r")
echo $data
done
How do i modify to run with xargs
or parallel
to speed up the process or instruct to use more resources of the computer?我如何修改以使用xargs
或parallel
运行以加快进程或指示使用更多计算机资源?
in each of your records, the semi-colon
delimited fields seems to contain the same keywords, in the same order, so you should be able to do something like this:在你的每条记录中, semi-colon
字段似乎包含相同的关键字,顺序相同,所以你应该能够做这样的事情:
#!/bin/bash
zcat input.tsv.gz |
awk '
BEGIN { OFS = "\t" }
NR > 1 {
split($7, a, ";")
print $1,$2,$3,$4,$5,$6,a[3],a[2],a[5],a[6],a[7],a[8]
}
'
chevrolet 18 8 307 130 3504 MAX=19 MIN=5 SUM=27 SD=0.5 IQR=9.5 RANG=7.5
buick 15 8 350 165 3693 MAX=17 MIN=7 SUM=30 SD=2.5 IQR=7.5 RANG=9.5
satellite 18 8 318 150 3436 MAX=11 MIN=2 SUM=17 SD=5.5 IQR=11.5 RANG=6.5
rebel 16 8 304 150 3433 MAX=15 MIN=8 SUM=24 SD=4.5 IQR=12.5 RANG=9.5
torino 17 8 302 140 3449 MAX=14 MIN=4 SUM=22 SD=1.5 IQR=7.5 RANG=5.5
As the semi-colon delimited fields can appear in random order, you'll need further processing for getting the correct ones:由于分号分隔的字段可以随机顺序出现,您需要进一步处理才能获得正确的字段:
zcat input.tsv.gz |
awk '
BEGIN { OFS = "\t" }
NR > 1 {
delete f
n = split($7, a, ";")
for (i = 1; i <= n; i++) {
match(a[i],/^[^=]*/)
f[ substr(a[i],RSTART,RLENGTH) ] = a[i]
}
print $1,$2,$3,$4,$5,$6,f["MAX"],f["MIN"],f["SUM"],f["SD"],f["IQR"],f["RANG"]
}
'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.