优化shell脚本（bash）以提高性能

Question

I have a bash script which I use to process a text file: 我有一个用于处理文本文件的bash脚本：

#/bin/bash

dos2unix sourcefile.txt

cat sourcefile.txt | grep -v '\/' | grep -v '\-\-' | grep -v '#' | grep '[A-Za-z]\*' > modified_sourcefile.txt

mv modified_sourcefile.txt sourcefile.txt
#
# Read the sourcefile file one line by line and iterate...
#

while read line
do

 echo $line | grep -v '\/' | grep -v '\-\-' | grep -v '#'
 if [ $? -eq 0 ]
 then

   # echo "Current Line is " $line ";"
    char1=`echo ${line:0:1}`
   # echo "1st char is " $char1

  if [ -n "$char1" ]
   # if a blank-line, neglect the line.
    then
        # echo "test passed"
        var1=`echo $line | cut -d '*' -f 1`
    var2=`echo $line | cut -d '*' -f 1`
    var3=`echo $line | cut -d - -f 1`
        var4=`echo $line | cut -d '*' -f 1`
        var5=`echo $line | cut -d '*' -f 2`
        var6=`echo $line | cut -d - -f 1`
        var7=`echo $line | cut -d '*' -f 3 `


        table1sql="INSERT IGNORE INTO table1 (id,name,active_yesno,category,description,
           last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'$var1',1,
           '$var2','$var3','admin',NOW() FROM table1;"

    echo $table1sql >> result.txt


    privsql="INSERT IGNORE INTO table2 (id,name,description,active_yesno,group_code,
             last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'$var1',
         '$var3',1,'$var2','admin',NOW() FROM table2;"

    echo $privsql >> result.txt     


    table1privmapsql="INSERT IGNORE INTO table1_table2_map (id,table1_id,table2_id,
                  last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,
                  (select id from table1 where name='$var1'),(select id from table2 where name='$var1'),'admin',NOW() FROM table1_table2_map;"
    echo $table1privmapsql >> result.txt

        privgroupsql="INSERT IGNORE INTO table2_group (id,name,category,active_yesno,last_modified_by,
                      last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'tablegrp','$pgpcode',1,'admin',NOW() FROM table2_group;"

        echo $privgroupsql >> result.txt


    privprivgrpsql="INSERT IGNORE INTO table2_table2group_map (id,table2_id,table2_group_id,
                        last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,
                        (select id from table2 where name='$var1'),(select id from table2_group where name='tablegrp'),'admin',NOW() FROM table2_table2group_map;"
        echo $privprivgrpsql >> result.txt              

    rolesql="INSERT IGNORE INTO role (id,name,active_yesno,security_domain_id,last_modified_by,last_modified_date_time) 
                 SELECT (select ifnull(MAX(id),0)+1 from role),'$rolename',1, sd.id ,'admin',NOW() 
                 FROM security_domain sd WHERE sd.name = 'General';"

        echo $rolesql >> result.txt

    fi                  
 fi                        
done < "sourcefile.txt"

The thing is sourcefile.txt has over 11000 lines. 问题是sourcefile.txt有11000多行。 So it takes about 25 min to complete :-( . 因此，大约需要25分钟才能完成:-(。

Is there a better way of doing it? 有更好的方法吗？

contents of sourcefile.txt: sourcefile.txt的内容：

AAA-something*LOCATION-some_where*ABC

Answer 1

To make this script faster you must minimize calls to external commands and use bash where is possible. 为了使此脚本更快，您必须最小化对外部命令的调用，并尽可能使用bash。

read this article to know what is useless use of commands. 阅读本文以了解什么是无用的命令。
read this article to know how to use bash to manipulate strings. 阅读本文以了解如何使用bash来操纵字符串。
replace repeating values(var1, var2, var4) assignment to single value. 将重复值（var1，var2，var4）分配替换为单个值。

While optimizing cut you can replace 优化cut您可以更换

var1=`echo $line | cut -d '*' -f 1`

to 至

var1="${line%%\**}"

And 和

var5=`echo $line | cut -d '*' -f 2`

to 至

var5="${line%\**}"
var5="${var5##*\*}"

Maybe it not so human-readable, but works much faster than cut. 也许它不是人类可读的，但是其工作速度比剪切快得多。

Also 也

 echo $line | grep -v '\/' | grep -v '\-\-' | grep -v '#'

can be replaced to something like that: 可以替换成这样的东西：

 if [[ "$line" =~ ([/#]|--) ]]; then :; else 
    # all code inside "if [ $? -eq 0 ]"
 fi

Answer 2

shell scripts are inherently slow, especially when they use a lot of external commands like yours. shell脚本天生就很慢，尤其是当它们使用许多外部命令（如您的命令）时。 Biggest reason for this is because spawning external process is rather slow, and you do it a lot of times. 造成这种情况的最大原因是，生成外部进程的速度相当慢，而且您执行了很多次。

If you are really after high performance processing of your data, you should write Perl or Python script which would do what you need without ever spawning any external process: no dos2unix , no grep , no cut or anything like that. 如果您确实是在对数据进行高性能处理之后，则应编写Perl或Python脚本，这些脚本可以完成所需的工作而不会产生任何外部过程：无dos2unix ，无grep ，无cut或类似操作。

Perl (and Python) are also perfectly capable of talking directly to database and inserting data, also without using external commands. Perl（和Python）也完全可以直接与数据库对话并插入数据，而无需使用外部命令。

If you do it right, I predict that processing performance using Perl will be at least 100x faster than you have now. 如果您做对了，我预计使用Perl的处理性能将比现在快至少100倍。

If you are ok with Perl, you can start with something like this and adjust to your liking: 如果您对Perl感到满意，则可以从类似这样的内容开始并根据自己的喜好进行调整：

#!/usr/bin/perl -w

use strict;
use warnings;

open FILE, "sourcefile.txt" or die $!;
open RESULT, ">>result.txt" or die $!;
while (my $line = <FILE>) {
    # ignore lines with /, -- or #: 
    next if $line =~ m{/|--|#};
    my ($var1, $var2, $var3, $var4, $var5) =
        ($line =~ /^(\w+)-(\w+)\*(\w+)-(\w+)\*(\w+)/);
    # ignore line if regex did not match:
    next unless $var1 and $var2 and $var3 and $var4 and $var5;
    print RESULT "some sql stmt. using $var1, $var2, etc";
    print RESULT "some other sql using $var1, $var2, etc";
    # ...
}
close RESULT;
close FILE;

Answer 3

Before optimizing, profile! 优化之前，先进行简介！ Learn how to use the time command. 了解如何使用时间命令。 Find out which part of your script takes the most time, and put your effort there. 找出脚本中哪一部分花费最多的时间，然后再努力。

Having said that, I would think that having multiple passes of grep will slow things down a bit. 话虽如此，我认为多次传递grep会使速度变慢。

This: 这个：

cat sourcefile.txt | grep -v '\/' | grep -v '\-\-' | grep -v '#' | grep '[A-Za-z]\*'

can be replaced by this: 可以替换为：

grep '[A-Za-z]\*' sourcefile.txt | grep -v -e '\/' -e '\-\-' -e '#'

优化shell脚本（bash）以提高性能

问题描述

3 个解决方案

解决方案1
5 2013-05-13 06:42:08

解决方案2
4 2013-05-13 06:07:42

解决方案3
1 2013-05-13 06:08:44

优化shell脚本（bash）以提高性能

问题描述

3 个解决方案

解决方案1 5 2013-05-13 06:42:08

解决方案2 4 2013-05-13 06:07:42

解决方案3 1 2013-05-13 06:08:44

解决方案1
5 2013-05-13 06:42:08

解决方案2
4 2013-05-13 06:07:42

解决方案3
1 2013-05-13 06:08:44