简体   繁体   English

在Linux中的目录中快速列出随机文件集

[英]Quickly list random set of files in directory in Linux

Question: I am looking for a performant, concise way to list N randomly selected files in a Linux directory using only Bash. 问题:我正在寻找一种高效,简洁的方法,仅使用Bash在Linux目录中列出N个随机选择的文件。 The files must be randomly selected from different subdirectories. 必须从不同的子目录中随机选择文件。

Why I'm asking: In Linux, I often want to test a random selection of files in a directory for some property. 我为什么要问:在Linux中,我经常想测试目录中某个属性的文件的随机选择。 The directories contain 1000's of files, so I only want to test a small number of them, but I want to take them from different subdirectories in the directory of interest. 该目录包含1000个文件,因此我只想测试其中的一小部分,但是我想从感兴趣目录的不同子目录中获取它们。

The following returns the paths of 50 "randomly"-selected files: 以下返回50个“随机”选择的文件的路径:

find /dir/of/interest/ -type f | sort -R | head -n 50

The directory contains many files, and resides on a mounted file system with slow read times (accessed through ssh), so the command can take many minutes. 该目录包含许多文件,并且驻留在读取速度较慢(通过ssh访问)的已挂载文件系统上,因此该命令可能需要花费几分钟。 I believe the issue is that the first find command finds every file (slow), and only then prints a random selection. 我相信问题在于第一个find命令会找到每个文件(慢速),然后才打印随机选择。

If you are using locate and updatedb updates regularly (daily is probably the default), you could: 如果您定期使用locateupdatedb更新(可能是默认的每日更新),则可以:

$ locate /home/james/test | sort -R | head -5
/home/james/test/10kfiles/out_708.txt
/home/james/test/10kfiles/out_9637.txt
/home/james/test/compr/bar
/home/james/test/10kfiles/out_3788.txt
/home/james/test/test

How often do you need it? 您多久需要一次? Do the work periodically in advance to have it quickly available when you need it. 提前定期进行工作,以便在需要时可以快速使用。

Create a refreshList script. 创建一个refreshList脚本。

#! /bin/env bash

find /dir/of/interest/ -type f | sort -R | head -n 50 >/tmp/rand.list
mv -f /tmp/rand.list ~

Put it in your crontab. 将其放在您的crontab中。

0 7-20 * * 1-5 nice -25 ~/refresh

Then you will always have a ~/rand.list that's under an hour old. 然后,您将始终拥有一个小时以下的〜/ rand.list。

If you don't want to use cron and aren't too picky about how old it is, just write a function that refreshes the file after you use it every time. 如果您不想使用cron并且对文件的年龄不太挑剔,只需编写一个函数,即可每次使用文件刷新文件。

randFiles() {
  cat ~/rand.list
  {  find /dir/of/interest/ -type f |
       sort -R | head -n 50 >/tmp/rand.list
      mv -f /tmp/rand.list ~
  } &
}

If you can't run locate and the find command is too slow, is there any reason this has to be done in real time? 如果您无法运行定位并且find命令太慢,是否有任何理由必须实时完成?

Would it be possible to use cron to dump the output of the find command into a file and then do the random pick out of there? 是否可以使用cron将find命令的输出转储到文件中,然后从那里随机抽取?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM