[英]Python listing last 10 modified files and reading each line of all 10 files
我需要一些帮助,在目录中列出文件并使用Python读取每个文件。 我知道如何使用Shell命令来执行此操作,但是有Python方式可以做到吗?
我想要:
1.)列出目录中的所有文件。
2.)获取最近的10个修改/最新文件(最好使用通配符)
3.)读取所有10个文件的每一行
使用shell命令,我可以:
Linux_System# ls -ltr | tail -n 10
-rw-rw-rw- 1 root root 999934 Jul 26 01:06 data_log.569
-rw-rw-rw- 1 root root 999960 Jul 26 02:05 data_log.570
-rw-rw-rw- 1 root root 999968 Jul 26 03:13 data_log.571
-rw-rw-rw- 1 root root 999741 Jul 26 04:20 data_log.572
-rw-rw-rw- 1 root root 999928 Jul 26 05:31 data_log.573
-rw-rw-rw- 1 root root 999942 Jul 26 06:45 data_log.574
-rw-rw-rw- 1 root root 999916 Jul 26 07:46 data_log.575
-rw-rw-rw- 1 root root 999862 Jul 26 08:59 data_log.576
-rw-rw-rw- 1 root root 999685 Jul 26 10:15 data_log.577
-rw-rw-rw- 1 root root 999633 Jul 26 11:26 data_log.578
Linux_System# cat data_log.{569..578}
使用glob,我能够列出文件并打开特定文件,但是不确定如何列出仅10个修改后的文件并将通配符文件列表提供给open函数。
import os, fnmatch, glob
files = glob.glob("data_event_log.*")
files.sort(key=os.path.getmtime)
print("\n".join(files))
data_event_log.569
data_event_log.570
data_event_log.571
data_event_log.572
data_event_log.573
data_event_log.574
data_event_log.575
data_event_log.576
data_event_log.577
data_event_log.578
with open(data_event_log.560, 'r') as f:
output_list = []
for line in f.readlines():
if line.startswith('Time'):
lineRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
a = (lineRegex.findall(line))
看起来差不多,您几乎已经完成了所有操作
import os.path, glob
files = glob.glob("data_event_log.*")
files.sort(key=os.path.getmtime)
latest=files[-10:] # last 10 entries
print("\n".join(latest))
lineRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
for fn in latest:
with open(fn) as f:
for line in f:
if line.startswith('Time'):
a = lineRegex.findall(line)
编辑:
尤其是如果您有许多文件,则更好,更简单的解决方案是
import os.path, glob, heapq
files = glob.iglob("data_event_log.*")
latest=heapq.nlargest(10, files, key=os.path.getmtime) # last 10 entries
print("\n".join(latest))
lineRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
for fn in latest:
with open(fn) as f:
for line in f:
if line.startswith('Time'):
a = lineRegex.findall(line)
您正在寻找的是固定大小的排序缓冲区。 尽管没有排序, collections.deque
这样做。 因此,这是一个缓冲区,它将满足您的需求,并且main
向您展示如何使用它
import bisect
import glob
import operator
import os
class Buffer:
def __init__(self, maxlen, minmax=1, key=None):
if key is None: key = lambda x: x
self.key = key
self.maxlen = maxlen
self.buffer = []
self.keys = []
self.minmax = minmax # 1 to track max values, -1 to track min values
# iterator variables
self.curr = 0
def __iter__(self): return self
def __next__(self):
if self.curr >= len(self.buffer): raise StopIteration
self.curr += 1
return self.buffer[self.curr-1]
def insert(self, x):
key = self.key(x)
idx = bisect.bisect_left(self.keys, key)
self.keys.insert(idx, key)
self.buffer.insert(idx, x)
if len(self.buffer) > self.maxlen:
if self.minmax>0:
self.buffer = self.buffer[-1 * self.maxlen :]
self.keys = self.keys[-1 * self.maxlen :]
elif self.minmax<0:
self.buffer = self.buffer[: self.maxlen]
self.keys = self.keys[: self.maxlen]
def main():
dirpath = "/path/to/directory"
modtime = lambda fpath: os.stat(fpath).st_mtime
buffer = Buffer(10, 1, modtime)
for fpath in glob.glob(os.path.join(dirpath, "*data_event_log.*")):
buffer.insert(fpath)
for fpath in buffer:
# open the file path and print whatever
pythonic答案:
使用带有lambda函数的sorted()
,然后使用列表切片来获取最早的10个或最新的10个或您拥有的东西。
from glob import glob
from os import stat
files = glob("*")
sorted_list = sorted(files, key=lambda x: stat(x).st_mtime)
truncated_list = sorted_list[-10:]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.