I have made a bash function that would separate the image files that are larger than a certain size but I have around 20000 files and it's taking too much time and it doesn't utilize the CPU much at all so I was wondering maybe it was possible to optimize it a little without real complicated multi-processing(I don't mind multi-processing but I don't want to write 20 lines of code for such a simple task)
Here is my code:
getpics() {
dir="larger than $1x$2"
mkdir "$dir"
for f in `ls *`; do
a=`file "$f" | grep -Po ", \K[\d]*x[\d]*"`
x=`grep -Po "\d*(?=x)" <<< "$a"`
y=`grep -Po "x\K\d*" <<< "$a"`
echo "$a _______________________ $x, $y"
if [ $x -gt $1 ] && [ $y -gt $2 ] ; then
mv "$f" "$dir/$f"
fi
done
}
You can try avoiding calls to external tools as much as possible and using bash
built-ins instead.
For example, to replace all grep
s, you can use bash
ERE (works in Bash 4+):
re='^.* ([0-9]+)x([0-9]+),.*$'
for f in *; do
desc=$(file "$f")
if [[ $desc =~ $re ]]; then
x=${BASH_REMATCH[1]}
y=${BASH_REMATCH[2]}
# ... check size & move
fi
done
ls
grep
calls by using BASH regex (thanks to comments below and @randomir's answer) Refactored script:
re=', ([0-9]+)x([0-9]+)'
getpics() {
dir="larger than $1x$2"
mkdir "$dir"
for f in *; do
if [[ $(file "$f") =~ $re ]]; then
x=${BASH_REMATCH[1]}
y=${BASH_REMATCH[2]}
echo "$a _______________________ $x, $y"
(( x > $1 && y > $2 )) && mv "$f" "$dir/$f"
fi
done
}
First lets do some benchmarks:
We start with the if:
$ time for i in `seq 1 100000`; do if [ 2 -gt 1 ] && [ 3 -gt 2 ]; then a=1; fi; done
real 0m0.694s
user 0m0.693s
sys 0m0.003s
$ time for i in `seq 1 100000`; do if [[ 2 -gt 1 && 3 -gt 2 ]]; then a=1; fi; done
real 0m0.428s
user 0m0.424s
sys 0m0.006s
$ time for i in `seq 1 100000`; do if (( 2 > 1 && 3 > 2 )); then a=1; fi; done
real 0m0.366s
user 0m0.364s
sys 0m0.003s
$ time for i in `seq 1 100000`; do (( 2 > 1 && 3 > 2 )) && a=1; done
real 0m0.355s
user 0m0.352s
sys 0m0.005s
Now let's look at ls
$ time for i in `ls *`; do a=1; done
real 0m0.280s
user 0m0.249s
sys 0m0.036s
$ time for i in *; do a=1; done
real 0m0.128s
user 0m0.128s
sys 0m0.000s
Now some people might wonder if
desc=$(file "$f")
if [[ $desc =~ $re ]]; then
Would be different than
if [[ $(file "$f") =~ $re ]]; then
But there is no difference in the result. I also tested it so many times but each time one was randomly faster than the other. But I'm not putting the result of that here because I think it has no use.
Again you might wonder if there is a difference between
^.* ([0-9]+)x([0-9]+),.*$
And ([0-9]+)x([0-9]+),
But I tested it and there is none. However according to regex101 The best regex (preserving the groupings) is:
.*, ([0-9]+)x([0-9]*) : 33 steps.
, ([0-9]+)x([0-9]+) : 34 steps.
^.* ([0-9]+)x([0-9]+),.*$ : 38 steps.
Now lets compare different ways of getting x
and y
:
$ time (files=( * ); for f in "${files[@]:0:1000}"; do IFS=, a=(`file $f`);IFS=x b=(${a[8]});done;)
real 0m5.580s
user 0m1.147s
sys 0m4.498s
$ time (files=( * ); for f in "${files[@]:0:1000}"; do if [[ $(file "$f") =~ $re ]]; then x=${BASH_REMATCH[1]}; y=${BASH_REMATCH[2]}; fi; done)
real 0m5.817s
user 0m1.234s
sys 0m4.619s
$ time (files=( * ); for f in "${files[@]:0:1000}"; do a=(`convert $f -print "%w %h\n" /dev/null`);done;)
real 0m10.356s
user 0m3.624s
sys 0m6.793s
$ time (files=( * ); for f in "${files[@]:0:1000}"; do a=$(file "$f" | grep -Po ", \K\d+x\d+"); IFS=x read x y <<<"$a"; done;)
real 0m12.645s
user 0m2.235s
sys 0m13.914s
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.