简体   繁体   中英

Find Filename with the Largest Number

I have a folder with 1-4 million files. Each file has this format:

trial_nubyb_$i_out.html

where $i is a number from 1 onwards

How do I get the largest 5 numbered files in the folder? I just need the 5 largest numbers , don't even need the filenames ie i just need largest 5 $i and no need trial_nubyb_$i_out.html, but the entire filenames are fine.

If I "ls -la | tail -5" this doesn't work because the system orders the filenames "alphabetically" and not from smallest to largest number, so the last 5 are actually:

trial_nubyb_999998_out.html
trial_nubyb_999999_out.html
trial_nubyb_99999_out.html
trial_nubyb_9999_out.html
trial_nubyb_999_out.html

I am using bash on Ubuntu.

A simple bash solution, but if it is too completed, PHP also welcome.

This answer applies to ls from the GNU core utilities which is used in Ubuntu. It is not actually included in bash itself and you would eg see a different output if you were using macOS.

You can add the -v option to get a "natural sort of (version) numbers within text" :

ls -lav | tail -5

ls will then sort "trial_nubyb_10_out.html" after "trial_nubyb_9_out.html":

bash-4.4$ ls -la
total 8
drwxrwxrwx 1 cg cg 4096 Nov 12 12:16 .
drwxrwxrwx 1 cg cg 4096 Sep 9 10:53 ..
bash-4.4$ touch trial_nubyb_{1,9,10,99,219}_out.html
bash-4.4$ ls -la
total 8
drwxrwxrwx 1 cg cg 4096 Nov 12 12:17 .
drwxrwxrwx 1 cg cg 4096 Sep 9 10:53 ..
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_10_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_1_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_219_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_99_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_9_out.html
bash-4.4$ ls -lav
total 8
drwxrwxrwx 1 cg cg 4096 Nov 12 12:17 .
drwxrwxrwx 1 cg cg 4096 Sep 9 10:53 ..
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_1_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_9_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_10_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_99_out.html
-rw-r--r-- 1 23941 23941 0 Nov 12 12:17 trial_nubyb_219_out.html  

(Note that version sort actually has a bit more complex logic , but that won't influence your current use case.)

Bash can't do what you want, easily. While it's possible to write sort functions in bash, it includes none of its own, so your best bet is to use other tools to sort.

While the Linux coreutils version of ls provides -v option that does what you want, it's not portable and won't work in macOS, FreeBSD, Solaris, etc. A portable option might be the following:

ls -f | cut -d_ -f3 | sort -n | tail -5

The -f option for ls tells it not to sort its output at all. If you have FOUR MILLION files in a directory, and you're sorting the output, you probably want this.

cut is an easy way to split up a string. We set a delimiter and a list of fields to output.

sort -n sorts numerically. This isn't quite the same as the Linux "natural/version" sort from ls -v , but it may perform better. YMMV.

If you want to sort the filenames without splitting the numbers out of them, the sort command has provisions for that. man sort and look for -t and -k options. These options are portable too. :)

$ ls -lf | sort -t_ -k3n | tail -5

(This assumes you don't have any stray underscores before the filename.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM