简体   繁体   English

如何递归查找目录中最新修改的文​​件?

[英]How to recursively find the latest modified file in a directory?

It seems that ls doesn't sort the files correctly when doing a recursive call:在进行递归调用时, ls似乎没有正确对文件进行排序:

ls -altR . | head -n 3

How can I find the most recently modified file in a directory (including subdirectories)?如何在目录(包括子目录)中找到最近修改的文件?

find . -type f -printf '%T@ %p\n' \
| sort -n | tail -1 | cut -f2- -d" "

For a huge tree, it might be hard for sort to keep everything in memory.对于一棵巨大的树, sort可能很难将所有内容都保存在内存中。

%T@ gives you the modification time like a unix timestamp, sort -n sorts numerically, tail -1 takes the last line (highest timestamp), cut -f2 -d" " cuts away the first field (the timestamp) from the output. %T@为您提供像 unix 时间戳一样的修改时间, sort -n按数字sort -ntail -1取最后一行(最高时间戳), cut -f2 -d" "从输出中cut -f2 -d" "第一个字段(时间戳) .

Edit: Just as -printf is probably GNU-only, ajreals usage of stat -c is too.编辑:就像-printf可能仅适用于 GNU 一样,ajreals 对stat -c也是如此。 Although it is possible to do the same on BSD, the options for formatting is different ( -f "%m %N" it would seem)虽然可以在 BSD 上做同样的事情,但格式化的选项是不同的( -f "%m %N"看起来)

And I missed the part of plural;我错过了复数部分; if you want more then the latest file, just bump up the tail argument.如果您想要更多然后最新文件,只需增加 tail 参数。

Following up on @plundra's answer , here's the BSD and OS X version:跟进@plundra 的回答,这里是 BSD 和 OS X 版本:

find . -type f -print0 \
| xargs -0 stat -f "%m %N" \
| sort -rn | head -1 | cut -f2- -d" "

Instead of sorting the results and keeping only the last modified ones, you could use awk to print only the one with greatest modification time (in unix time):您可以使用 awk 仅打印修改时间最长的结果(在 unix 时间),而不是对结果进行排序并仅保留最后修改的结果:

find . -type f -printf "%T@\0%p\0" | awk '
    {
        if ($0>max) {
            max=$0; 
            getline mostrecent
        } else 
            getline
    } 
    END{print mostrecent}' RS='\0'

This should be a faster way to solve your problem if the number of files is big enough.如果文件数量足够大,这应该是解决问题的更快方法。

I have used the NUL character (ie '\\0') because, theoretically, a filename may contain any character (including space and newline) but that.我使用了 NUL 字符(即 '\\0'),因为理论上,文件名可以包含除此之外的任何字符(包括空格和换行符)。

If you don't have such pathological filenames in your system you can use the newline character as well:如果您的系统中没有这样的病态文件名,您也可以使用换行符:

find . -type f -printf "%T@\n%p\n" | awk '
    {
        if ($0>max) {
            max=$0; 
            getline mostrecent
        } else 
            getline
    } 
    END{print mostrecent}' RS='\n'

In addition, this works in mawk too.此外,这也适用于 mawk。

This seems to work fine, even with subdirectories:这似乎工作正常,即使有子目录:

find . -type f | xargs ls -ltr | tail -n 1

In case of too many files, refine the find.如果文件太多,请细化查找。

I had the trouble to find the last modified file under Solaris 10. There find does not have the printf option and stat is not available.我在 Solaris 10 下很难找到最后修改的文件。 find没有printf选项并且stat不可用。 I discovered the following solution which works well for me:我发现以下解决方案对我很有效:

find . -type f | sed 's/.*/"&"/' | xargs ls -E | awk '{ print $6," ",$7 }' | sort | tail -1

To show the filename as well use要显示文件名以及使用

find . -type f | sed 's/.*/"&"/' | xargs ls -E | awk '{ print $6," ",$7," ",$9 }' | sort | tail -1

Explanation解释

  • find . -type f find . -type f finds and lists all files find . -type f查找并列出所有文件
  • sed 's/.*/"&"/' wraps the pathname in quotes to handle whitespaces sed 's/.*/"&"/'将路径名用引号括起来以处理空格
  • xargs ls -E sends the quoted path to ls , the -E option makes sure that a full timestamp (format year-month-day hour-minute-seconds-nanoseconds ) is returned xargs ls -E将引用的路径发送到ls-E选项确保返回完整的时间戳(格式年-月-日时-分-秒-纳秒
  • awk '{ print $6," ",$7 }' extracts only date and time awk '{ print $6," ",$7 }'只提取日期和时间
  • awk '{ print $6," ",$7," ",$9 }' extracts date, time and filename awk '{ print $6," ",$7," ",$9 }'提取日期、时间和文件名
  • sort returns the files sorted by date sort返回按日期sort的文件
  • tail -1 returns only the last modified file tail -1只返回最后修改的文件

Shows the latest file with human readable timestamp:显示具有人类可读时间戳的最新文件:

find . -type f -printf '%TY-%Tm-%Td %TH:%TM: %Tz %p\n'| sort -n | tail -n1

Result looks like this:结果如下所示:

2015-10-06 11:30: +0200 ./foo/bar.txt

To show more files, replace -n1 with a higher number要显示更多文件,请将-n1替换为更大的数字

I use something similar all the time, as well as the top-k list of most recently modified files.我一直使用类似的东西,以及最近修改过的文件的前 k 列表。 For large directory trees, it can be much faster to avoid sorting .对于大型目录树,避免排序快得多 In the case of just top-1 most recently modified file:在只有 top-1 最近修改的文件的情况下:

find . -type f -printf '%T@ %p\n' | perl -ne '@a=split(/\s+/, $_, 2); ($t,$f)=@a if $a[0]>$t; print $f if eof()'

On a directory containing 1.7 million files, I get the most recent one in 3.4s, a speed-up of 7.5x against the 25.5s solution using sort.在包含 170 万个文件的目录中,我在 3.4 秒内获得了最新的一个,与使用排序的 25.5 秒解决方案相比,速度提高了 7.5 倍。

This gives a sorted list:这给出了一个排序列表:

find . -type f -ls 2>/dev/null | sort -M -k8,10 | head -n5

Reverse the order by placing a '-r' in the sort command.通过在排序命令中放置“-r”来颠倒顺序。 If you only want filenames, insert "awk '{print $11}' |"如果您只想要文件名,请插入“awk '{print $11}' |” before '| '| 之前head'头'

On Ubuntu 13, the following does it, maybe a tad faster, as it reverses the sort and uses 'head' instead of 'tail', reducing the work.在 Ubuntu 13 上,以下是这样做的,可能会快一点,因为它颠倒了排序并使用“head”而不是“tail”,从而减少了工作。 To show the 11 newest files in a tree:要在树中显示 11 个最新文件:

find .寻找 。 -type f -printf '%T@ %p\\n' | -type f -printf '%T@ %p\\n' | sort -n -r |排序 -n -r | head -11 |头-11 | cut -f2- -d" " | cut -f2- -d" " | sed -e 's,^./,,' | sed -e 's,^./,,' | xargs ls -U -l xargs ls -U -l

This gives a complete ls listing without re-sorting and omits the annoying './' that 'find' puts on every file name.这给出了一个完整的 ls 列表,无需重新排序,并省略了 'find' 放在每个文件名上的烦人的 './'。

Or, as a bash function:或者,作为 bash 函数:

treecent () {
  local numl
  if [[ 0 -eq $# ]] ; then
    numl=11   # Or whatever default you want.
  else
    numl=$1
  fi
  find . -type f -printf '%T@ %p\n' | sort -n -r | head -${numl} |  cut -f2- -d" " | sed -e 's,^\./,,' | xargs ls -U -l
}

Still, most of the work was done by plundra's original solution.尽管如此,大部分工作还是由 plundra 的原始解决方案完成的。 Thanks plundra.谢谢普伦德拉。

I faced the same issue.我遇到了同样的问题。 I need to find the most recent file recursively.我需要递归地找到最新的文件。 find took around 50 minutes to find. find 花了大约 50 分钟才找到。

Here is a little script to do it faster:这是一个小脚本,可以更快地完成它:

#!/bin/sh

CURRENT_DIR='.'

zob () {
    FILE=$(ls -Art1 ${CURRENT_DIR} | tail -n 1)
    if [ ! -f ${FILE} ]; then
        CURRENT_DIR="${CURRENT_DIR}/${FILE}"
        zob
    fi
    echo $FILE
    exit
}
zob

It's a recursive function who get the most recent modified item of a directory.这是一个递归函数,它获取目录的最新修改项。 If this item is a directory, the function is called recursively and search into this directory, etc.如果此项是目录,则递归调用该函数并搜索该目录等。

I find the following shorter and with more interpretable output:我发现以下内容更短且具有更多可解释的输出:

find . -type f -printf '%TF %TT %p\n' | sort | tail -1

Given the fixed length of the standardised ISO format datetimes, lexicographical sorting is fine and we don't need the -n option on the sort.鉴于标准化 ISO 格式日期时间的固定长度,字典排序很好,我们不需要-n选项进行排序。

If you want to remove the timestamps again, you can use:如果要再次删除时间戳,可以使用:

find . -type f -printf '%TFT%TT %p\n' | sort | tail -1 | cut -f2- -d' '

如果单独在每个文件上运行stat会变慢,您可以使用xargs来加快速度:

find . -type f -print0 | xargs -0 stat -f "%m %N" | sort -n | tail -1 | cut -f2- -d" " 

这将递归地将当前目录中所有目录的修改时间更改为每个目录中的最新文件:

for dir in */; do find $dir -type f -printf '%T@ "%p"\n' | sort -n | tail -1 | cut -f2- -d" " | xargs -I {} touch -r {} $dir; done

This simple cli will also work:这个简单的 cli 也可以工作:

ls -1t | head -1

You may change the -1 to the number of files you want to list您可以将 -1 更改为要列出的文件数

I found the command above useful, but for my case I needed to see the date and time of the file as well I had an issue with several files that have spaces in the names.我发现上面的命令很有用,但就我而言,我还需要查看文件的日期和时间,我遇到了几个名称中包含空格的文件的问题。 Here is my working solution.这是我的工作解决方案。

find . -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" " | sed 's/.*/"&"/' | xargs ls -l

我更喜欢这个,它更短:

find . -type f -print0|xargs -0 ls -drt|tail -n 1

以下命令适用于 Solaris :

find . -name "*zip" -type f | xargs ls -ltr | tail -1 

I wrote a pypi/github package for this question because I needed a solution as well.我为这个问题写了一个 pypi/github 包,因为我也需要一个解决方案。

https://github.com/bucknerns/logtail https://github.com/bucknerns/logtail

Install:安装:

pip install logtail

Usage: tails changed files用法:尾部更改的文件

logtail <log dir> [<glob match: default=*.log>]

Usage2: Opens latest changed file in editor用法2:在编辑器中打开最新更改的文件

editlatest <log dir> [<glob match: default=*.log>]

Ignoring hidden files — with nice & fast time stamp忽略隐藏文件 - 带有漂亮且快速的时间戳

Here is how to find and list the latest modified files in a directory with subdirectories.以下是如何在具有子目录的目录中查找和列出最新修改的文​​件。 Hidden files are ignored on purpose.故意忽略隐藏文件。 The time format can be customised.时间格式可以自定义。

$ find . -type f -not -path '*/\.*' -printf '%TY.%Tm.%Td %THh%TM %Ta %p\n' |sort -nr |head -n 10

Result结果

Handles spaces in filenames well — not that these should be used!很好地处理文件名中的空格 - 不应该使用这些!

2017.01.25 18h23 Wed ./indenting/Shifting blocks visually.mht
2016.12.11 12h33 Sun ./tabs/Converting tabs to spaces.mht
2016.12.02 01h46 Fri ./advocacy/2016.Vim or Emacs - Which text editor do you prefer?.mht
2016.11.09 17h05 Wed ./Word count - Vim Tips Wiki.mht

More更多的

More find galore following the link.find链接下面的称誉。

To search for files in /target_directory and all its sub-directories, that have been modified in the last 60 minutes:要在 /target_directory 及其所有子目录中搜索最近 60 分钟内修改过的文件:

$ find /target_directory -type f -mmin -60

To find the most recently modified files, sorted in the reverse order of update time (ie, the most recently updated files first):查找最近修改的文件,按更新时间倒序排序(即最近更新的文件排在前):

$ find /etc -type f -printf '%TY-%Tm-%Td %TT %p\n' | sort -r

After using a find -based solution for years, I found myself wanting the ability to exclude directories like .git .在使用基于find的解决方案多年后,我发现自己希望能够排除.git类的.git

I switched to this rsync -based solution.我切换到这个基于rsync的解决方案。 Put this in ~/bin/findlatest :把它放在~/bin/findlatest

#!/bin/sh
# Finds most recently modified files.
rsync -rL --list-only "$@" | grep -v '^d' | sort -k3,4r | head -5

Now findlatest .现在findlatest . will list the 5 most recently modified files, and findlatest --exclude .git .将列出 5 个最近修改的文件,并findlatest --exclude .git . will list the 5 excluding ones in .git .将列出.git排除的 5 个。

This works by taking advantage of some little-used rsync functionality: "if a single source arg is specified [to rsync] without a destination, the files are listed in an output format similar to ls -l" ( rsync man page).这是通过利用一些很少使用的 rsync 功能来实现的: “如果在没有目标的情况下指定 [to rsync] 单个源 arg,则文件将以类似于 ls -l 的输出格式列出”rsync手册页)。

The ability to take rsync args is useful in conjunction with rsync-based backup tools.与基于 rsync 的备份工具结合使用 rsync args 的能力非常有用。 For instance I use rsnapshot , and I back up an application directory with rsnapshot.conf line:例如,我使用rsnapshot ,并使用rsnapshot.conf行备份应用程序目录:

backup  /var/atlassian/application-data/jira/current/   home    +rsync_long_args=--archive --filter="merge /opt/atlassian/jira/current/backups/rsync-excludes"

where rsync-excludes lists directories I don't want to backup:其中rsync-excludes列出了我不想备份的目录:

- log/
- logs/
- analytics-logs/
- tmp/
- monitor/*.rrd4j

I can see now the latest files that will be backed up with:我现在可以看到将要备份的最新文件:

findlatest /var/atlassian/application-data/jira/current/ --filter="merge /opt/atlassian/jira/current/backups/rsync-excludes"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM