简体   繁体   中英

Script to group numbered files into folders

I have around a million files in one folder in the form xxxx_description.jpg where xxx is a number ranging from 100 to an unknown upper.

The list is similar to this:

146467_description1.jpg 146467_description2.jpg 146467_description3.jpg 146467_description4.jpg 14646_description1.jpg 14646_description2.jpg 14646_description3.jpg 146472_description1.jpg 146472_description2.jpg 146472_description3.jpg 146500_description1.jpg 146500_description2.jpg 146500_description3.jpg 146500_description4.jpg 146500_description5.jpg 146500_description6.jpg

To get the file number down in the at folder I'd like to put them all into folders grouped by the number at the start.

ie: 146467/146467_description1.jpg 146467/146467_description2.jpg 146467/146467_description3.jpg 146467/146467_description4.jpg 14646/14646_description1.jpg 14646/14646_description2.jpg 14646/14646_description3.jpg 146472/146472_description1.jpg 146472/146472_description2.jpg 146472/146472_description3.jpg 146500/146500_description1.jpg 146500/146500_description2.jpg 146500/146500_description3.jpg 146500/146500_description4.jpg 146500/146500_description5.jpg 146500/146500_description6.jpg

I was thinking to try and use command line: find | awk {} | mv command or maybe write a script, but I'm not sure how to do this most efficiently.

You can use this script:

for i in [0-9]*_*.jpg; do
   p=`echo "$i" | sed 's/^\([0-9]*\)_.*/\1/'`
   mkdir -p "$p"
   mv "$i" "$p"
done

Using grep

   for file in *.jpg; 
    do 
    dirName=$(echo $file | grep -oE '^[0-9]+')
    [[ -d $dirName ]] || mkdir $dirName
    mv $file $dirName
    done

grep -oE '^[0-9]+' extracts the starting digits in the filename as

146467
146467
146467
146467
14646
...

[[ -d $dirName ]] returns 1 if the directory exists

[[ -d $dirName ]] || mkdir $dirName [[ -d $dirName ]] || mkdir $dirName ensures that the mkdir works only if the test [[ -d $dirName ]] fails, that is the direcotry does not exists

If you really are dealing with millions of files, I suspect that a glob ( *.jpg or [0-9]*_*.jpg may fail because it makes a command line that's too long for the shell. If that's the case, you can still use find . Something like this might work:

find /path -name "[0-9]*_*.jpg" -exec sh -c 'f="{}"; mkdir -p "/target/${f%_*}"; mv "$f" "/target/${f%_*}/"' \;

Broken out for easier reading, this is what we're doing:

  • find /path - run find, with /path as a starting point,
  • -name "[0-9]*_*.jpg" - match files that match this filespec in all directories,
  • -exec sh -c execute the following on each file...
    • 'f="{}"; - put the filename into a variable...
    • mkdir -p "/target/${f%_*}"; - make a target directory based on that variable (read mkdir's man page about the -p option)
    • mv "$f" "/target/${f%_*}/"' - move the file into the directory.
    • \\; - end the -exec expression

On the up side, it can handle any number of files that find can handle (ie limited only by your OS). On the down side, it's launching a separate shell for each file to be handled.

Note that the above answer is for Bourne/POSIX/Bash . If you're using CSH or TCSH as your shell, the following might work instead:

#!/bin/tcsh

foreach f (*_*.jpg)
  set split = ($f:as/_/ /)
  mkdir -p "$split[1]"
  mv "$f" "$split[1]/"
end

This assumes that the filespec will fit in tcsh's glob buffer. I've tested with 40000 files (894KB) on one command line and not had a problem using /bin/sh or /bin/csh in FreeBSD. Like the Bourne/POSIX/Bash parameter expansion solution above, this avoids unnecessary calls to external I haven't tested that, and would recommend the find solution even though it's slower.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM