[英]Optimize shell script (bash) to impove performance
I have a bash script which I use to process a text file: 我有一个用于处理文本文件的bash脚本:
#/bin/bash
dos2unix sourcefile.txt
cat sourcefile.txt | grep -v '\/' | grep -v '\-\-' | grep -v '#' | grep '[A-Za-z]\*' > modified_sourcefile.txt
mv modified_sourcefile.txt sourcefile.txt
#
# Read the sourcefile file one line by line and iterate...
#
while read line
do
echo $line | grep -v '\/' | grep -v '\-\-' | grep -v '#'
if [ $? -eq 0 ]
then
# echo "Current Line is " $line ";"
char1=`echo ${line:0:1}`
# echo "1st char is " $char1
if [ -n "$char1" ]
# if a blank-line, neglect the line.
then
# echo "test passed"
var1=`echo $line | cut -d '*' -f 1`
var2=`echo $line | cut -d '*' -f 1`
var3=`echo $line | cut -d - -f 1`
var4=`echo $line | cut -d '*' -f 1`
var5=`echo $line | cut -d '*' -f 2`
var6=`echo $line | cut -d - -f 1`
var7=`echo $line | cut -d '*' -f 3 `
table1sql="INSERT IGNORE INTO table1 (id,name,active_yesno,category,description,
last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'$var1',1,
'$var2','$var3','admin',NOW() FROM table1;"
echo $table1sql >> result.txt
privsql="INSERT IGNORE INTO table2 (id,name,description,active_yesno,group_code,
last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'$var1',
'$var3',1,'$var2','admin',NOW() FROM table2;"
echo $privsql >> result.txt
table1privmapsql="INSERT IGNORE INTO table1_table2_map (id,table1_id,table2_id,
last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,
(select id from table1 where name='$var1'),(select id from table2 where name='$var1'),'admin',NOW() FROM table1_table2_map;"
echo $table1privmapsql >> result.txt
privgroupsql="INSERT IGNORE INTO table2_group (id,name,category,active_yesno,last_modified_by,
last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'tablegrp','$pgpcode',1,'admin',NOW() FROM table2_group;"
echo $privgroupsql >> result.txt
privprivgrpsql="INSERT IGNORE INTO table2_table2group_map (id,table2_id,table2_group_id,
last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,
(select id from table2 where name='$var1'),(select id from table2_group where name='tablegrp'),'admin',NOW() FROM table2_table2group_map;"
echo $privprivgrpsql >> result.txt
rolesql="INSERT IGNORE INTO role (id,name,active_yesno,security_domain_id,last_modified_by,last_modified_date_time)
SELECT (select ifnull(MAX(id),0)+1 from role),'$rolename',1, sd.id ,'admin',NOW()
FROM security_domain sd WHERE sd.name = 'General';"
echo $rolesql >> result.txt
fi
fi
done < "sourcefile.txt"
The thing is sourcefile.txt has over 11000 lines. 问题是sourcefile.txt有11000多行。 So it takes about 25 min to complete :-( . 因此,大约需要25分钟才能完成:-(。
Is there a better way of doing it? 有更好的方法吗?
contents of sourcefile.txt: sourcefile.txt的内容:
AAA-something*LOCATION-some_where*ABC
To make this script faster you must minimize calls to external commands and use bash where is possible. 为了使此脚本更快,您必须最小化对外部命令的调用,并尽可能使用bash。
read this article to know what is useless use of commands. 阅读本文以了解什么是无用的命令。
read this article to know how to use bash to manipulate strings. 阅读本文以了解如何使用bash来操纵字符串。
replace repeating values(var1, var2, var4) assignment to single value. 将重复值(var1,var2,var4)分配替换为单个值。
While optimizing cut
you can replace 优化cut
您可以更换
var1=`echo $line | cut -d '*' -f 1`
to 至
var1="${line%%\**}"
And 和
var5=`echo $line | cut -d '*' -f 2`
to 至
var5="${line%\**}"
var5="${var5##*\*}"
Maybe it not so human-readable, but works much faster than cut. 也许它不是人类可读的,但是其工作速度比剪切快得多。
Also 也
echo $line | grep -v '\/' | grep -v '\-\-' | grep -v '#'
can be replaced to something like that: 可以替换成这样的东西:
if [[ "$line" =~ ([/#]|--) ]]; then :; else
# all code inside "if [ $? -eq 0 ]"
fi
shell scripts are inherently slow, especially when they use a lot of external commands like yours. shell脚本天生就很慢,尤其是当它们使用许多外部命令(如您的命令)时。 Biggest reason for this is because spawning external process is rather slow, and you do it a lot of times. 造成这种情况的最大原因是,生成外部进程的速度相当慢,而且您执行了很多次。
If you are really after high performance processing of your data, you should write Perl or Python script which would do what you need without ever spawning any external process: no dos2unix
, no grep
, no cut
or anything like that. 如果您确实是在对数据进行高性能处理之后,则应编写Perl或Python脚本,这些脚本可以完成所需的工作而不会产生任何外部过程:无dos2unix
,无grep
,无cut
或类似操作。
Perl (and Python) are also perfectly capable of talking directly to database and inserting data, also without using external commands. Perl(和Python)也完全可以直接与数据库对话并插入数据,而无需使用外部命令。
If you do it right, I predict that processing performance using Perl will be at least 100x faster than you have now. 如果您做对了,我预计使用Perl的处理性能将比现在快至少100倍。
If you are ok with Perl, you can start with something like this and adjust to your liking: 如果您对Perl感到满意,则可以从类似这样的内容开始并根据自己的喜好进行调整:
#!/usr/bin/perl -w
use strict;
use warnings;
open FILE, "sourcefile.txt" or die $!;
open RESULT, ">>result.txt" or die $!;
while (my $line = <FILE>) {
# ignore lines with /, -- or #:
next if $line =~ m{/|--|#};
my ($var1, $var2, $var3, $var4, $var5) =
($line =~ /^(\w+)-(\w+)\*(\w+)-(\w+)\*(\w+)/);
# ignore line if regex did not match:
next unless $var1 and $var2 and $var3 and $var4 and $var5;
print RESULT "some sql stmt. using $var1, $var2, etc";
print RESULT "some other sql using $var1, $var2, etc";
# ...
}
close RESULT;
close FILE;
Before optimizing, profile! 优化之前,先进行简介! Learn how to use the time command. 了解如何使用时间命令。 Find out which part of your script takes the most time, and put your effort there. 找出脚本中哪一部分花费最多的时间,然后再努力。
Having said that, I would think that having multiple passes of grep
will slow things down a bit. 话虽如此,我认为多次传递grep
会使速度变慢。
This: 这个:
cat sourcefile.txt | grep -v '\/' | grep -v '\-\-' | grep -v '#' | grep '[A-Za-z]\*'
can be replaced by this: 可以替换为:
grep '[A-Za-z]\*' sourcefile.txt | grep -v -e '\/' -e '\-\-' -e '#'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.