I have around a million files in one folder in the form xxxx_description.jpg where xxx is a number ranging from 100 to an unknown upper.
The list is similar to this:
146467_description1.jpg 146467_description2.jpg 146467_description3.jpg 146467_description4.jpg 14646_description1.jpg 14646_description2.jpg 14646_description3.jpg 146472_description1.jpg 146472_description2.jpg 146472_description3.jpg 146500_description1.jpg 146500_description2.jpg 146500_description3.jpg 146500_description4.jpg 146500_description5.jpg 146500_description6.jpg
To get the file number down in the at folder I'd like to put them all into folders grouped by the number at the start.
ie: 146467/146467_description1.jpg 146467/146467_description2.jpg 146467/146467_description3.jpg 146467/146467_description4.jpg 14646/14646_description1.jpg 14646/14646_description2.jpg 14646/14646_description3.jpg 146472/146472_description1.jpg 146472/146472_description2.jpg 146472/146472_description3.jpg 146500/146500_description1.jpg 146500/146500_description2.jpg 146500/146500_description3.jpg 146500/146500_description4.jpg 146500/146500_description5.jpg 146500/146500_description6.jpg
I was thinking to try and use command line: find | awk {} | mv command or maybe write a script, but I'm not sure how to do this most efficiently.
You can use this script:
for i in [0-9]*_*.jpg; do
p=`echo "$i" | sed 's/^\([0-9]*\)_.*/\1/'`
mkdir -p "$p"
mv "$i" "$p"
done
Using grep
for file in *.jpg;
do
dirName=$(echo $file | grep -oE '^[0-9]+')
[[ -d $dirName ]] || mkdir $dirName
mv $file $dirName
done
grep -oE '^[0-9]+'
extracts the starting digits in the filename as
146467
146467
146467
146467
14646
...
[[ -d $dirName ]]
returns 1
if the directory exists
[[ -d $dirName ]] || mkdir $dirName
[[ -d $dirName ]] || mkdir $dirName
ensures that the mkdir
works only if the test [[ -d $dirName ]]
fails, that is the direcotry does not exists
If you really are dealing with millions of files, I suspect that a glob ( *.jpg
or [0-9]*_*.jpg
may fail because it makes a command line that's too long for the shell. If that's the case, you can still use find
. Something like this might work:
find /path -name "[0-9]*_*.jpg" -exec sh -c 'f="{}"; mkdir -p "/target/${f%_*}"; mv "$f" "/target/${f%_*}/"' \;
Broken out for easier reading, this is what we're doing:
find /path
- run find, with /path
as a starting point, -name "[0-9]*_*.jpg"
- match files that match this filespec in all directories, -exec sh -c
execute the following on each file...
'f="{}";
- put the filename into a variable... mkdir -p "/target/${f%_*}";
- make a target directory based on that variable (read mkdir's man page about the -p
option) mv "$f" "/target/${f%_*}/"'
- move the file into the directory. \\;
- end the -exec
expression On the up side, it can handle any number of files that find
can handle (ie limited only by your OS). On the down side, it's launching a separate shell for each file to be handled.
Note that the above answer is for Bourne/POSIX/Bash . If you're using CSH or TCSH as your shell, the following might work instead:
#!/bin/tcsh
foreach f (*_*.jpg)
set split = ($f:as/_/ /)
mkdir -p "$split[1]"
mv "$f" "$split[1]/"
end
This assumes that the filespec will fit in tcsh's glob buffer. I've tested with 40000 files (894KB) on one command line and not had a problem using /bin/sh or /bin/csh in FreeBSD. Like the Bourne/POSIX/Bash parameter expansion solution above, this avoids unnecessary calls to external I haven't tested that, and would recommend the find
solution even though it's slower.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.