简体   繁体   English

如何加快检查文件是否存在于 bash 中

[英]how to speed up checking if file exists in bash

I'm new at Bashing and wrote a code to check my photos files but find it very slow and gets a few empty returns checking 17000+ photos.我是 Bashing 的新手,写了一个代码来检查我的照片文件,但发现它非常慢,并且在检查 17000 多张照片时得到一些空返回。 Is there any way to use all 4 cpus running this script and so speed it up有什么办法可以使用所有 4 个 CPU 来运行这个脚本,从而加快速度

Please help请帮忙

#!/bin/bash
readarray -t array < ~/Scripts/ourphotos.txt
totalfiles="${#array[@]}"
echo $totalfiles
i=0
ii=0
check1=""
while : 
do

check=${array[$i]}
if [[ ! -r $( echo $check ) ]] ; then
    if [ $check = $check1 ]; then
     echo "empty "$check
    else
    unset array[$i]
    ii=$((ii + 1 ))
    fi
fi
if [ $totalfiles = $i ]; then
break
fi
i=$(( i + 1 ))
done 

if [ $ii -gt "1" ]; then
 notify-send -u critical $ii" files have been deleted or are unreadable"
 fi

It's a filesystem operation so multiple cores will hardly help.这是一个文件系统操作,因此多核几乎无济于事。 Simplification might:简化可能:

while read file; do 
   i=$((i+1)); [ -e "$file" ] || ii=$(ii+1)); 
done < "$HOME/Scripts/ourphotos.txt"
#...

Two points:两点:

  • you don't need to keep the whole file in memory (no arrays needed)您不需要将整个文件保存在内存中(不需要数组)
  • $( echo $check ) forks a proces. $( echo $check )分叉一个进程。 You generally want to avoid forking and execing in loops.您通常希望避免在循环中分叉和执行。

This is an old question, but a common problem lacking an evidence-based solution.这是一个老问题,但却是一个缺乏基于证据的解决方案的常见问题。

awk '{print "[ -e "$1" ] && echo "$2}' | parallel    # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash        # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find                                           # 200000 files/s
parallel --xargs find                                # 250000 files/s
xargs -P2 find                                       # 400000 files/s
xargs -P96 find                                      # 800000 files/s

I tried this on a few different systems and the results were not consistent, but xargs -P (parallel execution) was consistently the fastest.我在几个不同的系统上尝试过,结果并不一致,但 xargs -P(并行执行)始终是最快的。 I was surprised that xargs -P was faster than GNU parallel (not reported above, but sometimes much faster), and I was surprised that parallel execution helped so much — I thought that file I/O would be the limiting factor and parallel execution wouldn't matter much.我很惊讶 xargs -P 比 GNU parallel 更快(上面没有报告,但有时更快),我很惊讶并行执行有这么大的帮助——我认为文件 I/O 将是限制因素,并行执行不会没关系。

Also noteworthy is that xargs find is about 20x faster than the accepted solution, and much more concise.同样值得注意的是, xargs find比公认的解决方案快大约 20 倍,而且更加简洁。 For example, here is a rewrite of OP's script:例如,这里是 OP 脚本的重写:

#!/bin/bash

total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')

# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~//Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)

if [ $total -ne $found ]; then
  ii=$(expr $total - $found)
  notify-send -u critical $ii" files have been deleted or are unreadable"
fi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM