简体   繁体   English

在目录中查找最旧的文件(递归)

[英]Find the oldest file (recursively) in a directory

I'm writing a Python backup script and I need to find the oldest file in a directory (and its sub-directories). 我正在编写Python备份脚本,我需要在目录(及其子目录)中找到最旧的文件。 I also need to filter it down to *.avi files only. 我还需要将其过滤为* .avi文件。

The script will always be running on a Linux machine. 该脚本将始终在Linux计算机上运行。 Is there some way to do it in Python or would running some shell commands be better? 有没有办法在Python中执行它或运行一些shell命令更好?

At the moment I'm running df to get the free space on a particular partition, and if there is less than 5 gigabytes free, I want to start deleting the oldest *.avi files until that condition is met. 目前我正在运行df来获取特定分区上的可用空间,如果空闲时间少于5千兆字节,我想开始删除最旧的*.avi文件,直到满足该条件。

Hm. 嗯。 Nadia's answer is closer to what you meant to ask; 纳迪亚的回答更接近你的意思 ; however, for finding the (single) oldest file in a tree, try this: 但是,要查找树中的(单个)最旧文件,请尝试以下操作:

import os
def oldest_file_in_tree(rootfolder, extension=".avi"):
    return min(
        (os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootfolder)
        for filename in filenames
        if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime)

With a little modification, you can get the n oldest files (similar to Nadia's answer): 通过一些修改,您可以获得n最旧的文件(类似于Nadia的答案):

import os, heapq
def oldest_files_in_tree(rootfolder, count=1, extension=".avi"):
    return heapq.nsmallest(count,
        (os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootfolder)
        for filename in filenames
        if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime)

Note that using the .endswith method allows calls as: 请注意,使用.endswith方法允许调用:

oldest_files_in_tree("/home/user", 20, (".avi", ".mov"))

to select more than one extension. 选择多个扩展名。

Finally, should you want the complete list of files, ordered by modification time, in order to delete as many as required to free space, here's some code: 最后,如果您想要按修改时间排序的完整文件列表,以便删除尽可能多的空闲空间,这里有一些代码:

import os
def files_to_delete(rootfolder, extension=".avi"):
    return sorted(
        (os.path.join(dirname, filename)
         for dirname, dirnames, filenames in os.walk(rootfolder)
         for filename in filenames
         if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime),
        reverse=True)

and note that the reverse=True brings the oldest files at the end of the list, so that for the next file to delete, you just do a file_list.pop() . 并注意reverse=True会将最旧的文件放在列表的末尾,这样对于要删除的下一个文件,您只需执行file_list.pop()

By the way, for a complete solution to your issue, since you are running on Linux, where the os.statvfs is available, you can do: 顺便说一句,为了完整解决您的问题,因为您在Linux上运行, os.statvfs可用,您可以:

import os
def free_space_up_to(free_bytes_required, rootfolder, extension=".avi"):
    file_list= files_to_delete(rootfolder, extension)
    while file_list:
        statv= os.statvfs(rootfolder)
        if statv.f_bfree*statv.f_bsize >= free_bytes_required:
            break
        os.remove(file_list.pop())

statvfs.f_bfree are the device free blocks and statvfs.f_bsize is the block size. statvfs.f_bfree是设备空闲块, statvfs.f_bsize是块大小。 We take the rootfolder statvfs, so mind any symbolic links pointing to other devices, where we could delete many files without actually freeing up space in this device. 我们采用rootfolder statvfs,因此请注意指向其他设备的任何符号链接,我们可以删除许多文件而不会实际释放此设备中的空间。

UPDATE (copying a comment by Juan): 更新(由Juan复制评论):

Depending on the OS and filesystem implementation, you may want to multiply f_bfree by f_frsize rather than f_bsize. 根据操作系统和文件系统的实现,您可能希望将f_bfree乘以f_frsize而不是f_bsize。 In some implementations, the latter is the preferred I/O request size. 在一些实现中,后者是优选的I / O请求大小。 For example, on a FreeBSD 9 system I just tested, f_frsize was 4096 and f_bsize was 16384. POSIX says the block count fields are "in units of f_frsize" ( see http://pubs.opengroup.org/onlinepubs/9699919799//basedefs/sys_statvfs.h.html ) 例如,在我刚测试的FreeBSD 9系统上,f_frsize为4096,f_bsize为16384. POSIX表示块计数字段是“以f_frsize为单位”(参见http://pubs.opengroup.org/onlinepubs/9699919799// basedefs / sys_statvfs.h.html

要在Python中执行此操作,可以使用os.walk(path)以递归方式遍历文件,并使用os.stat(filename)st_sizest_mtime属性来获取文件大小和修改时间。

You can use stat and fnmatch modules together to find the files 您可以一起使用statfnmatch模块来查找文件

ST_MTIME refere to the last modification time. ST_MTIME参考最后修改时间。 You can choose another value if you want 如果需要,您可以选择其他值

import os, stat, fnmatch
file_list = []
for filename in os.listdir('.'):
    if fnmatch.fnmatch(filename, '*.avi'):
        file_list.append((os.stat(filename)[stat.ST_MTIME], filename))

Then you can order the list by time and delete according to it. 然后您可以按时间排序列表并根据它进行删除。

file_list.sort(key=lambda a: a[0])

I think the easiest way to do this would be to use find along with ls -t (sort files by time). 我认为最简单的方法是使用find和ls -t(按时间排序文件)。

something along these lines should do the trick (deletes oldest avi file under specified directory) 沿着这些行的东西应该做的伎俩(删除指定目录下最旧的avi文件)

find / -name "*.avi" | xargs ls -t | tail -n 1 | xargs rm

step by step.... 一步步....

find / -name "*.avi" - find all avi files recursively starting at the root directory find / -name“* .avi” - 从根目录开始递归查找所有avi文件

xargs ls -t - sort all files found by modification time, from newest to oldest. xargs ls -t - 对修改时间找到的所有文件进行排序,从最新到最旧。

tail -n 1 - grab the last file in the list (oldest) tail -n 1 - 获取列表中的最后一个文件(最旧的)

xargs rm - and remove it xargs rm - 并删除它

Here's another Python formulation, which a bit old-school compared to some others, but is easy to modify, and handles the case of no matching files without raising an exception. 这是另一个Python公式,有点老派与其他一些相比,但很容易修改,并且处理不匹配文件的情况而不引发异常。

import os

def find_oldest_file(dirname="..", extension=".avi"):
    oldest_file, oldest_time = None, None
    for dirpath, dirs, files in os.walk(dirname):
        for filename in files:
            file_path = os.path.join(dirpath, filename)
            file_time = os.stat(file_path).st_mtime
                if file_path.endswith(extension) and (file_time<oldest_time or oldest_time is None):
                oldest_file, oldest_time = file_path, file_time
    return oldest_file, oldest_time

print find_oldest_file()

Check out the linux command find . 查看linux命令find

Alternatively, this post pipes together ls and tail to delete the oldest file in a directory. 或者, 此帖子将ls和tail连接在一起以删除目录中最旧的文件。 That could be done in a loop while there isn't enough free space. 这可以在循环中完成,同时没有足够的可用空间。

For reference, here's the shell code that does it (follow the link for more alternatives and a discussion): 作为参考,这是执行它的shell代码(按照链接获取更多替代方案和讨论):

ls -t -r -1 /path/to/files | head --lines 1 | xargs rm

The os module provides the functions that you need to get directory listings and file info in Python. os模块提供了在Python中获取目录列表和文件信息所需的功能。 I've found os.walk to be especially useful for walking directories recursively, and os.stat will give you detailed info (including modification time) on each entry. 我发现os.walk对于递归遍历目录特别有用,os.stat将为您提供每个条目的详细信息(包括修改时间)。

You may be able to do this easier with a simple shell command. 使用简单的shell命令,您可以更轻松地完成此操作。 Whether that works better for you or not depends on what you want to do with the results. 这对你有效还是更好取决于你想对结果做些什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM