[英]How to plot timeseries from grouped logs with pandas?
I'm trying to analyze a log file using Pandas.我正在尝试使用 Pandas 分析日志文件。 I want to plot three lines for the count of levels "ERROR", "INFO", and "WARN" per second.
我想为每秒“错误”、“信息”和“警告”级别的计数绘制三行。 With x = date (seconds), y = count.
x = 日期(秒),y = 计数。
After importing my log file, my data frame looks like this:导入日志文件后,我的数据框如下所示:
df_logs
I floor the date per second:我每秒计算日期:
df_logs['date'] = df_logs['date'].dt.floor('S')
Then I group by message level:然后我按消息级别分组:
ds_grouped = df_logs.groupby(['date','level'])['level'].count()
From here, I'm completely stuck:从这里开始,我完全陷入困境:
type(ds_grouped)
> pandas.core.frame.DataFrame
I guess the correct seaborn plot is:我猜正确的seaborn情节是:
sns.lineplot(x='date',
y='count',
hue='level',
data=ds_grouped)
How to plot the grouped data frame?如何绘制分组数据框?
Here is a way to create the plot, IIUC:这是一种创建情节的方法,IIUC:
# create test data
import numpy as np
import pandas as pd
n = 10_000
np.random.seed(123)
timestamps = pd.date_range(start='2020-08-27 09:00:00',
periods=60*60*4, freq='1s')
level = ['info', 'info', 'info', 'warn','warn', 'error']
df = pd.DataFrame(
{'timestamp': np.random.choice(timestamps, n),
'level': np.random.choice(level, n),})
print(df.head())
timestamp level
0 2020-08-27 09:59:42 info
1 2020-08-27 12:14:06 warn
2 2020-08-27 09:22:26 info
3 2020-08-27 12:24:12 error
4 2020-08-27 10:26:58 info
Second, sample in 5-minute intervals.其次,每隔 5 分钟采样一次。 You can change frequency in
pd.Grouper
below:您可以在下面的
pd.Grouper
中更改频率:
t = (df.assign(counter = 1)
.set_index('timestamp')
.groupby([pd.Grouper(freq='5min'), 'level']).sum()
.squeeze()
.unstack())
print(t.head())
level error info warn
timestamp
2020-08-27 09:00:00 35 123 66
2020-08-27 09:05:00 32 91 73
2020-08-27 09:10:00 41 113 64
2020-08-27 09:15:00 32 110 66
2020-08-27 09:20:00 35 107 61
Third, create the plot with t.plot();
第三,使用
t.plot();
创建绘图t.plot();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.