简体   繁体   English

使用os.walk()检索特定目录

[英]Retrieving specific directories using os.walk()

I have a set of jobs ( job1 , job2 etc) that runs every hour and after they are completed generate folders ( session1 , session2 etc) which contains the log files. 我有一组每小时运行的作业( job1job2等),完成后会生成包含日志文件的文件夹( session1session2等)。 Due to storage limitation, I need a script that can delete the session directories older than a set time limit but also want to specify that it must keeps a specified number of session directories eg keep the latest 2 sessions, even if they are older than set time limit. 由于存储限制,我需要一个脚本,该脚本可以删除早于设置时间限制的会话目录,但还想指定它必须保留指定数量的会话目录,例如保留最新的2个会话,即使它们早于设置的会话时限。

How can I achieve this using python os.walk() ? 如何使用python os.walk()实现此目的? I want to return a list of session dirs to delete sessions_to_delete = [] 我想返回会话目录列表以删除sessions_to_delete = []

/root    
    /job1             (runs every one hour)    
        /session1
            /*log
        /session2
        /session3
    /job2
        /session1
        /session2

In this case, it is probably easier to list all directories with glob.glob() , to match your hierarchy pattern. 在这种情况下,使用glob.glob()列出所有目录以匹配您的层次结构模式可能更容易。 You can use os.path.getctime() to get a timestamp for each directory to sort and filter by 您可以使用os.path.getctime()来获取每个目录的时间戳,以进行排序和过滤

from glob import glob
import os.path
import time

def find_sessions_to_delete(cutoff):
    # produce a list of (timestamp, path) tuples for each session directory
    session_dirs = [(os.path.getctime(p), p) for p in glob('/root/job*/session*')]
    session_dirs.sort(reverse=True)  # sort from newest to oldest
    # remove first two elements, they are kept regardless
    session_dirs = session_dirs[2:]
    # return a list of paths whose ctime lies before the cutoff time
    return [p for t, p in session_dirs if t <= cutoff]

cutoff = time.time() - (7 * 86400)  # 7 days ago
sessions_to_delete = find_sessions_to_delete(cutoff)

I included a sample cutoff date at 7 days ago, calculated from time.time() , which returns an integer value, expressing the number of seconds passed since the 1st of January 1970 (the UNIX epoch). 我包括了一个7天前的示例截止日期,它是根据time.time()计算的,该日期返回一个整数值,表示自1970年1月1日(UNIX时期)以来经过的秒数。

If you needed to do this per job directory , do the same work per such directory and merge the resulting lists: 如果需要在每个作业目录中执行此操作,请在每个目录中执行相同的工作然后合并结果列表:

def find_sessions_to_delete(cutoff):
    to_delete = []

    # process each jobdir separately
    for jobdir in glob('/root/job*'):
        # produce a list of (timestamp, path) tuples for each session directory
        session_dirs = [(os.path.getctime(p), p)
                        for p in glob(os.path.join(jobdir, 'session*'))]
        session_dirs.sort(reverse=True)  # sort from newest to oldest
        # remove first two elements, they are kept regardless
        session_dirs = session_dirs[2:]
        # Add list of paths whose ctime lies before the cutoff time
        to_delete.extend(p for t, p in session_dirs if t <= cutoff)

    return to_delete

You can use os.path.getatime(path) or os.path.getmtime(path) to figure out how "old" is a folder and then do what you need to do with it... Here the basic info about the os.path module https://docs.python.org/2/library/os.path.html#module-os.path 您可以使用os.path.getatime(path)或os.path.getmtime(path)来找出文件夹的“旧”程度,然后执行所需的操作...这里是有关os的基本信息.path模块https://docs.python.org/2/library/os.path.html#module-os.path

one approach to solve your problem could be this one: 解决您的问题的一种方法可能是:

import os
import time

for folder in list_of_folders:
    if time.time() - os.path.getmtime(folder) > time_limit:
        delete_folder(folder)

if you build up the list_of_folders using append() then you can save the last two folders by changing the for loop easily like this. 如果使用append()构建list_of_folders,则可以通过这样轻松地更改for循环来保存最后两个文件夹。

for folder in list_of_folders[:-2]:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM