[英]Analyze logs with Python
我有一个带有日志的 csv 文件。 我需要分析它并从文件中选择必要的信息。 问题是它有很多带有标题的表格。 他们没有名字。 表之间由空行分隔,也彼此分隔。 假设我需要从 %idle 列中选择所有数据,其中 CPU = all
结构:
09:20:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
09:21:06,all,4.98,0.00,5.10,0.00,0.00,0.00,0.06,0.00,89.86
09:21:06,0,12.88,0.00,5.62,0.03,0.00,0.02,1.27,0.00,80.18
12:08:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
12:09:06,all,5.48,0.00,5.24,0.00,0.00,0.00,0.12,0.00,89.15
12:09:06,0,18.57,0.00,5.35,0.02,0.00,0.00,3.00,0.00,73.06
09:20:06,runq-sz,plist-sz,ldavg-1,ldavg-5,ldavg-15
09:21:06,3,1444,2.01,2.12,2.15
09:22:06,4,1444,2.15,2.14,2.15
一种相当愚蠢的解决方案是对原始 CSV 使用“普通”文件阅读器。 您可以将所有内容读取到一个新的换行符作为单个 CSV,然后解析您刚刚在内存中读取的文本。
每次“看到”换行符时,您都知道将其视为全新的 CSV,因此您可以对其重复上述过程。
例如,您将有一个包含以下内容的字符串:
09:20:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
09:21:06,all,4.98,0.00,5.10,0.00,0.00,0.00,0.06,0.00,89.86
09:21:06,0,12.88,0.00,5.62,0.03,0.00,0.02,1.27,0.00,80.18
然后在内存中解析它。 一旦你到了换行符,你就会知道你需要一个包含以下内容的新字符串:
12:08:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
12:09:06,all,5.48,0.00,5.24,0.00,0.00,0.00,0.12,0.00,89.15
12:09:06,0,18.57,0.00,5.35,0.02,0.00,0.00,3.00,0.00,73.06
等等 - 您可以像这样处理尽可能多的表。
您可以使用以下程序来解析此 csv。
result={}
with open("log.csv","r") as f:
for table in f.read().split("\n\n"):
rows=table.split("\n")
header=rows[0]
for row in rows[1:]:
for i,j in zip(header.split(",")[1:],row.split(",")[1:]):
if i in result:
result[i].append(j)
else:
result[i]=[j]
print(result["%idle"])
输出(%idle 的值)
['89.86', '80.18', '89.15', '73.06']
这假设表列和行值的顺序相同,并且没有两个表具有共同的列名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.