[英]Retrieving specific directories using os.walk()
I have a set of jobs ( job1
, job2
etc) that runs every hour and after they are completed generate folders ( session1
, session2
etc) which contains the log files. 我有一组每小时运行的作业(
job1
, job2
等),完成后会生成包含日志文件的文件夹( session1
, session2
等)。 Due to storage limitation, I need a script that can delete the session directories older than a set time limit but also want to specify that it must keeps a specified number of session directories eg keep the latest 2 sessions, even if they are older than set time limit. 由于存储限制,我需要一个脚本,该脚本可以删除早于设置时间限制的会话目录,但还想指定它必须保留指定数量的会话目录,例如保留最新的2个会话,即使它们早于设置的会话时限。
How can I achieve this using python os.walk()
? 如何使用python
os.walk()
实现此目的? I want to return a list of session dirs to delete sessions_to_delete = []
我想返回会话目录列表以删除
sessions_to_delete = []
/root
/job1 (runs every one hour)
/session1
/*log
/session2
/session3
/job2
/session1
/session2
In this case, it is probably easier to list all directories with glob.glob()
, to match your hierarchy pattern. 在这种情况下,使用
glob.glob()
列出所有目录以匹配您的层次结构模式可能更容易。 You can use os.path.getctime()
to get a timestamp for each directory to sort and filter by 您可以使用
os.path.getctime()
来获取每个目录的时间戳,以进行排序和过滤
from glob import glob
import os.path
import time
def find_sessions_to_delete(cutoff):
# produce a list of (timestamp, path) tuples for each session directory
session_dirs = [(os.path.getctime(p), p) for p in glob('/root/job*/session*')]
session_dirs.sort(reverse=True) # sort from newest to oldest
# remove first two elements, they are kept regardless
session_dirs = session_dirs[2:]
# return a list of paths whose ctime lies before the cutoff time
return [p for t, p in session_dirs if t <= cutoff]
cutoff = time.time() - (7 * 86400) # 7 days ago
sessions_to_delete = find_sessions_to_delete(cutoff)
I included a sample cutoff date at 7 days ago, calculated from time.time()
, which returns an integer value, expressing the number of seconds passed since the 1st of January 1970 (the UNIX epoch). 我包括了一个7天前的示例截止日期,它是根据
time.time()
计算的,该日期返回一个整数值,表示自1970年1月1日(UNIX时期)以来经过的秒数。
If you needed to do this per job directory , do the same work per such directory and merge the resulting lists: 如果需要在每个作业目录中执行此操作,请在每个目录中执行相同的工作,然后合并结果列表:
def find_sessions_to_delete(cutoff):
to_delete = []
# process each jobdir separately
for jobdir in glob('/root/job*'):
# produce a list of (timestamp, path) tuples for each session directory
session_dirs = [(os.path.getctime(p), p)
for p in glob(os.path.join(jobdir, 'session*'))]
session_dirs.sort(reverse=True) # sort from newest to oldest
# remove first two elements, they are kept regardless
session_dirs = session_dirs[2:]
# Add list of paths whose ctime lies before the cutoff time
to_delete.extend(p for t, p in session_dirs if t <= cutoff)
return to_delete
You can use os.path.getatime(path) or os.path.getmtime(path) to figure out how "old" is a folder and then do what you need to do with it... Here the basic info about the os.path module https://docs.python.org/2/library/os.path.html#module-os.path 您可以使用os.path.getatime(path)或os.path.getmtime(path)来找出文件夹的“旧”程度,然后执行所需的操作...这里是有关os的基本信息.path模块https://docs.python.org/2/library/os.path.html#module-os.path
one approach to solve your problem could be this one: 解决您的问题的一种方法可能是:
import os
import time
for folder in list_of_folders:
if time.time() - os.path.getmtime(folder) > time_limit:
delete_folder(folder)
if you build up the list_of_folders using append() then you can save the last two folders by changing the for loop easily like this. 如果使用append()构建list_of_folders,则可以通过这样轻松地更改for循环来保存最后两个文件夹。
for folder in list_of_folders[:-2]:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.