Script to group numbered files into folders

Question

I have around a million files in one folder in the form xxxx_description.jpg where xxx is a number ranging from 100 to an unknown upper.

The list is similar to this:

146467_description1.jpg 146467_description2.jpg 146467_description3.jpg 146467_description4.jpg 14646_description1.jpg 14646_description2.jpg 14646_description3.jpg 146472_description1.jpg 146472_description2.jpg 146472_description3.jpg 146500_description1.jpg 146500_description2.jpg 146500_description3.jpg 146500_description4.jpg 146500_description5.jpg 146500_description6.jpg

To get the file number down in the at folder I'd like to put them all into folders grouped by the number at the start.

ie: 146467/146467_description1.jpg 146467/146467_description2.jpg 146467/146467_description3.jpg 146467/146467_description4.jpg 14646/14646_description1.jpg 14646/14646_description2.jpg 14646/14646_description3.jpg 146472/146472_description1.jpg 146472/146472_description2.jpg 146472/146472_description3.jpg 146500/146500_description1.jpg 146500/146500_description2.jpg 146500/146500_description3.jpg 146500/146500_description4.jpg 146500/146500_description5.jpg 146500/146500_description6.jpg

I was thinking to try and use command line: find | awk {} | mv command or maybe write a script, but I'm not sure how to do this most efficiently.

Answer 1

You can use this script:

for i in [0-9]*_*.jpg; do
   p=`echo "$i" | sed 's/^\([0-9]*\)_.*/\1/'`
   mkdir -p "$p"
   mv "$i" "$p"
done

Answer 2

Using grep

   for file in *.jpg; 
    do 
    dirName=$(echo $file | grep -oE '^[0-9]+')
    [[ -d $dirName ]] || mkdir $dirName
    mv $file $dirName
    done

grep -oE '^[0-9]+' extracts the starting digits in the filename as

[[ -d $dirName ]] returns 1 if the directory exists

[[ -d $dirName ]] || mkdir $dirName [[ -d $dirName ]] || mkdir $dirName ensures that the mkdir works only if the test [[ -d $dirName ]] fails, that is the direcotry does not exists

Answer 3

If you really are dealing with millions of files, I suspect that a glob ( *.jpg or [0-9]*_*.jpg may fail because it makes a command line that's too long for the shell. If that's the case, you can still use find . Something like this might work:

find /path -name "[0-9]*_*.jpg" -exec sh -c 'f="{}"; mkdir -p "/target/${f%_*}"; mv "$f" "/target/${f%_*}/"' \;

Broken out for easier reading, this is what we're doing:

find /path - run find, with /path as a starting point,
-name "[0-9]*_*.jpg" - match files that match this filespec in all directories,
-exec sh -c execute the following on each file...
- 'f="{}"; - put the filename into a variable...
- mkdir -p "/target/${f%_*}"; - make a target directory based on that variable (read mkdir's man page about the -p option)
- mv "$f" "/target/${f%_*}/"' - move the file into the directory.
- \\; - end the -exec expression

On the up side, it can handle any number of files that find can handle (ie limited only by your OS). On the down side, it's launching a separate shell for each file to be handled.

Note that the above answer is for Bourne/POSIX/Bash . If you're using CSH or TCSH as your shell, the following might work instead:

#!/bin/tcsh

foreach f (*_*.jpg)
  set split = ($f:as/_/ /)
  mkdir -p "$split[1]"
  mv "$f" "$split[1]/"
end

This assumes that the filespec will fit in tcsh's glob buffer. I've tested with 40000 files (894KB) on one command line and not had a problem using /bin/sh or /bin/csh in FreeBSD. Like the Bourne/POSIX/Bash parameter expansion solution above, this avoids unnecessary calls to external I haven't tested that, and would recommend the find solution even though it's slower.

Script to group numbered files into folders

Question

3 answers

solution1
0 2014-11-03 17:26:40

solution2
0 2014-11-03 17:32:38

solution3
0 ACCPTED 2014-11-03 17:45:44

Script to group numbered files into folders

Question

3 answers

solution1 0 2014-11-03 17:26:40

solution2 0 2014-11-03 17:32:38

solution3 0 ACCPTED 2014-11-03 17:45:44

solution1
0 2014-11-03 17:26:40

solution2
0 2014-11-03 17:32:38

solution3
0 ACCPTED 2014-11-03 17:45:44