简体   繁体   English

脚本很慢

[英]Very slow script

I have a problem. 我有个问题。 I need to write a bash script that will find all files and directories in given path and will display some info about results. 我需要编写一个bash脚本,它将在给定路径中查找所有文件和目录,并显示一些有关结果的信息。 Allowed time: 30 seconds. 允许的时间:30秒。

#!/bin/bash

DIRS=0
FILES=0
OLD_FILES=0
LARGE_FILES=0
TMP_FILES=0
EXE_FILES=0
IMG_FILES=0
SYM_LINKS=0
TOTAL_BYTES=0

#YEAR_AGO=$(date -d "now - 1 year" +%s)
#SECONDS_IN_YEAR=31536000

function check_dir {
    for entry in "$1"/*
    do
        if [ -d "$entry" ]; then
            ((DIRS+=1))
            check_dir "$entry"
        else if [ -f "$entry" ]; then
                ((FILES+=1))
                #SIZE=$(stat -c%s "$entry")
                #((TOTAL_BYTES+=SIZE))
                #CREATE_DATE=$(date -r "$entry" +%s)
                #CREATE_DATE=$(stat -c%W "$entry")
                #DIFF=$((CREATE_DATE-YEAR_AGO))
                #if [ $DIFF -ge $SECONDS_IN_YEAR ]; then
                #   ((OLD_FILES+=1))
                #fi
             fi

        fi
    done
}

if [ $# -ne 2 ]; then
    echo "Usage: ./srpt path emailaddress"
    exit 1
fi

if [ ! -d $1 ]; then
    echo "Provided path is invalid"
    exit 1
fi

check_dir $1

echo "Execution time $SECONDS"
echo "Dicrecoties $DIRS"
echo "Files $FILES"
echo "Sym links $SYM_LINKS"
echo "Old files $OLD_FILES"
echo "Large files $LARGE_FILES"
echo "Graphics files $IMG_FILES"
echo "Temporary files $TMP_FILES"
echo "Executable files $EXE_FILES"
echo "Total file size $TOTAL_BYTES"

Here are result of executing with commented lines above: 这是执行上面带有注释行的结果:

Execution time 1
Dicrecoties 931
Files 14515
Sym links 0
Old files 0
Large files 0
Graphics files 0
Temporary files 0
Executable files 0
Total file size 0

If I'll delete comment from 如果我要删除评论

SIZE=$(stat -c%s "$entry")
((TOTAL_BYTES+=SIZE))

I got: 我有:

Execution time 31
Dicrecoties 931
Files 14515
Sym links 0
Old files 0
Large files 0
Graphics files 0
Temporary files 0
Executable files 0
Total file size 447297022

31 seconds. 31秒。 How can I speed up my script? 如何加快脚本速度? Another +30 seconds gives finding of files with date creating more the one year 再加上30秒钟,可以查找日期创建一年以上的文件

More often than not, using loops in shells is an indication that you're going for the wrong approach. 通常,在Shell中使用循环表明您选择了错误的方法。

A shell is before all a tool to run other tools. 外壳首先是运行其他工具的工具。

Though it can do counting, awk is a better tool to do it. 尽管它可以进行计数,但是awk是一个更好的工具。

Though it can list and find files, find is better at it. 尽管它可以列出和查找文件,但find更好。

The best shell scripts are those that manage to have a few tools contribute to the task, not those that start millions of tools in sequence and where all the job is done by the shell. 最好的Shell脚本是那些设法使一些工具有助于完成任务的脚本,而不是那些顺序启动数百万个工具并且所有工作都由Shell完成的脚本。

Here, typically a better approach would be to have find find the files and gather all the data you need, and have awk munch it and return the statistics. 在这里,通常是一个更好的办法是有find找到文件,并收集所有你需要的数据,有awk咀嚼它,并返回统计信息。 Here using GNU find and GNU awk (for RS='\\0' ) and GNU date (for -d ): 这里使用GNU find和GNU awk (对于RS='\\0' )和GNU date (对于-d ):

find . -printf '%y.%s.%Ts%p\0' |
  awk -v RS='\0' -F'[.]' -v yearago="$(date -d '1 year ago' +%s)" '
    {
      type[$1]++; 
      if ($1 == "f") {
        total_size+=$2
        if ($3 < yearago) old++
        if (!index($NF, "/")) ext[tolower($NF)]++
      }
    }
    END {
      printf("%20s: %d\n", "Directories", type["d"])
      printf("%20s: %d\n", "Total size", total_size)
      printf("%20s: %d\n", "old", old)
      printf("%20s: %d\n", "jpeg", ext["jpg"]+ext["jpeg"])
      printf("%20s: %d\n", "and so on...", 0)
    }'

The key is to avoid firing up too many utilities. 关键是要避免启动过多的实用程序。 You seem to be invoking two or three per file, which will be quite slow. 您似乎每个文件调用两个或三个,这将非常慢。

Also, the comments show that handling filenames, in general, is complicated, particularly if the filenames might have spaces and/or newlines in them. 此外,注释还显示,处理文件名通常很复杂,尤其是在文件名中可能包含空格和/或换行符的情况下。 But you don't actually need the filenames, if I understand your problem correctly, since you are only using them to collect information. 但是,如果我正确地理解了您的问题,那么实际上并不需要文件名,因为您仅使用它们来收集信息。

If you're using gnu find , you can extract the stat information directly from find , which will be quite a lot more efficient, since find needs to do a stat() anyway on every file. 如果您使用的是gnu find ,则可以直接从find提取统计信息,这样效率会高得多,因为find仍然需要对每个文件执行stat() Here's an example, which pipes from find into awk for simplicity: 这是一个示例,为简单起见,将其从find awkawk

summary() {
  find "$@" '(' -type f -o -type d ')' -printf '%y %s %C@\n' |
  awk '$1=="d"{DIR+=1;next}
       $1!="f"{next}
       {REG+=1;SIZE+=$2}
       $3<'$(date +%s -d"last year")'{OLD+=1}
       END{printf "Directories: %d\nFiles: %d\nOld files: %d\nTotal Size: %d\n",
                  DIR, REG, OLD, SIZE}'
}

On my machine, that summarised 28718 files in 4817 directories in one-tenth of a second elapsed time. 在我的机器上,这在十分之一秒的时间内就将4718目录中的28718个文件汇总了。 YMMV. YMMV。

You surely want to avoid parsing the output of find as you did (see my comment): it'll break whenever you have spaces in filenames. 您肯定希望避免像以前那样解析find的输出(请参阅我的评论):只要文件名中有空格,它就会中断。

You surely want to avoid forking to external processes like your $(stat ...) or $(date ...) statements: each fork costs a lot! 您肯定要避免派生到$(stat ...)$(date ...)语句之类的外部过程:每个fork都花很多钱!

It turns out that find on its own can do quite a lot. 事实证明, find可以做很多事情。 For example, if we want to count the numbers of files, dirs and links. 例如,如果我们要计算文件,目录和链接的数量。

We all know the naive way in (pretty much what you've done): 我们都知道的幼稚方式(几乎完成了):

#!/bin/bash

shopt -s globstar
shopt -s nullglob
shopt -s dotglob
nbfiles=0
nbdirs=0
for f in ./**; do
    [[ -f $f ]] && ((++nbfiles))
    [[ -d $f ]] && ((++nbdirs))
done
echo "There are $nbdirs directories and $nbfiles files, and we're very happy."

Caveat . 警告 This method counts links according to what they link to: a link to a file will be counted as a file. 此方法根据链接的链接数进行计数:指向文件的链接将被计为文件。

How about the find way? find方式如何? Count number of files, directories and (symbolic) links: 计算文件,目录和(符号)链接的数量:

#!/bin/bash

nbfiles=0
nbdirs=0
nblinks=0
while read t n; do
    case $t in
    dirs) ((nbdirs+=n+1)) ;;
    files) ((nbfiles+=n+1)) ;;
    links) ((nblinks+=n+1)) ;;
    esac
done < <(
    find . -type d -exec bash -c 'echo "dirs $#"' {} + \
         -or -type f -exec bash -c 'echo "files $#"' {} + \
         -or -type l -exec bash -c 'echo "links $#"' {} + 2> /dev/null
)
echo "There are $nbfiles files, $nbdirs dirs and $nblinks links. You're happy to know aren't you?"

Same principles, using associative arrays, more fields and more involved find logic: 使用关联数组,更多字段和更多涉及的相同原理find逻辑:

#!/bin/bash

declare -A fields

while read f n; do
    ((fields[$f]+=n))
done < <(
    find . -type d -exec bash -c 'echo "dirs $(($#+1))"' {} + \
        -or -type f -exec bash -c 'echo "files $(($#+1))"' {} + -printf 'size %s\n' \
            \( \
                \( -iname '*.jpg' -printf 'jpg 1\n' -printf 'jpg_size %s\n' \) \
                -or -size +100M -printf 'large 1\n' \
            \) \
        -or -type l -exec bash -c 'echo "links $(($#+1))"' {} + 2> /dev/null
)

for f in "${!fields[@]}"; do
    printf "%s: %s\n" "$f" "${fields[$f]}"
done

I hope this will give you some ideas! 我希望这会给您一些想法! Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM