如何加快检查文件是否存在于 bash 中

Question

I'm new at Bashing and wrote a code to check my photos files but find it very slow and gets a few empty returns checking 17000+ photos.我是 Bashing 的新手，写了一个代码来检查我的照片文件，但发现它非常慢，并且在检查 17000 多张照片时得到一些空返回。 Is there any way to use all 4 cpus running this script and so speed it up有什么办法可以使用所有 4 个 CPU 来运行这个脚本，从而加快速度

Please help请帮忙

#!/bin/bash
readarray -t array < ~/Scripts/ourphotos.txt
totalfiles="${#array[@]}"
echo $totalfiles
i=0
ii=0
check1=""
while : 
do

check=${array[$i]}
if [[ ! -r $( echo $check ) ]] ; then
    if [ $check = $check1 ]; then
     echo "empty "$check
    else
    unset array[$i]
    ii=$((ii + 1 ))
    fi
fi
if [ $totalfiles = $i ]; then
break
fi
i=$(( i + 1 ))
done 

if [ $ii -gt "1" ]; then
 notify-send -u critical $ii" files have been deleted or are unreadable"
 fi

Answer 1

It's a filesystem operation so multiple cores will hardly help.这是一个文件系统操作，因此多核几乎无济于事。 Simplification might:简化可能：

while read file; do 
   i=$((i+1)); [ -e "$file" ] || ii=$(ii+1)); 
done < "$HOME/Scripts/ourphotos.txt"
#...

Two points:两点：

you don't need to keep the whole file in memory (no arrays needed)您不需要将整个文件保存在内存中（不需要数组）
$( echo $check ) forks a proces. $( echo $check )分叉一个进程。 You generally want to avoid forking and execing in loops.您通常希望避免在循环中分叉和执行。

Answer 2

This is an old question, but a common problem lacking an evidence-based solution.这是一个老问题，但却是一个缺乏基于证据的解决方案的常见问题。

awk '{print "[ -e "$1" ] && echo "$2}' | parallel    # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash        # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find                                           # 200000 files/s
parallel --xargs find                                # 250000 files/s
xargs -P2 find                                       # 400000 files/s
xargs -P96 find                                      # 800000 files/s

I tried this on a few different systems and the results were not consistent, but xargs -P (parallel execution) was consistently the fastest.我在几个不同的系统上尝试过，结果并不一致，但 xargs -P（并行执行）始终是最快的。 I was surprised that xargs -P was faster than GNU parallel (not reported above, but sometimes much faster), and I was surprised that parallel execution helped so much — I thought that file I/O would be the limiting factor and parallel execution wouldn't matter much.我很惊讶 xargs -P 比 GNU parallel 更快（上面没有报告，但有时更快），我很惊讶并行执行有这么大的帮助——我认为文件 I/O 将是限制因素，并行执行不会没关系。

Also noteworthy is that xargs find is about 20x faster than the accepted solution, and much more concise.同样值得注意的是， xargs find比公认的解决方案快大约 20 倍，而且更加简洁。 For example, here is a rewrite of OP's script:例如，这里是 OP 脚本的重写：

#!/bin/bash

total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')

# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~//Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)

if [ $total -ne $found ]; then
  ii=$(expr $total - $found)
  notify-send -u critical $ii" files have been deleted or are unreadable"
fi

如何加快检查文件是否存在于 bash 中

问题描述

2 个解决方案

解决方案1
1 2016-04-28 05:31:19

解决方案2
1 2020-10-13 16:02:58

如何加快检查文件是否存在于 bash 中

问题描述

2 个解决方案

解决方案1 1 2016-04-28 05:31:19

解决方案2 1 2020-10-13 16:02:58

解决方案1
1 2016-04-28 05:31:19

解决方案2
1 2020-10-13 16:02:58