简体   繁体   English

在Python中使用时间序列数据创建水平条形图

[英]Creating Horizontal Bar Plot With Time-Series Data in Python

I have the following problem: 我有以下问题:

Given a pandas dataframe with a number of unique hostnames, I would like to plot a horizontal bar graph that indicates the length of time that a particular issue occurred with this hostname. 给定一个具有多个唯一主机名的pandas数据框,我想绘制一个水平条形图,以指示该主机名发生特定问题的时间长度。

I have the following code: 我有以下代码:

# Create a bar plot for each unique system name of all ticket entries
for sys_name in unique_sys_names:
    # Grab the df that refers to just the issues with that system name
    j_data_sys = eff_j_data[eff_j_data['System Name'] == sys_name]
    eff_j_data_sys = j_data_sys[['Created','Resolved','Summary']]
    eff_j_data_sys.plot.barh(x=eff_j_data_sys['Resolved']-eff_j_data_sys['Created'],y=range(0,len(eff_j_data_sys)))

Essentially, I have unique hostnames in a larger pandas dataframe, each with an issue ranging from 1 to N. In the for loop, I simply iterate through the unique hostnames ( sys_name ) and then I grab all the issues related to that hostname in j_data_sys . 本质上,我在较大的pandas数据框中有唯一的主机名,每个主机名的问题范围从1到N。在for循环中,我简单地遍历唯一的主机名( sys_name ),然后在j_data_sys获取与该主机名有关的所有问题。 。 I then grab all the times that each issue was created and resolved as well as the Summary of the issue. 然后,我将抓住所有创建和解决每个问题的时间以及该问题的摘要。 All I would like to do is indicated in the following image: Example Bar Plot 下图指示了我想做的所有事情: 示例条形图

Of course, this could include N issues, each with corresponding timestamps of start and finished. 当然,这可以包括N个问题,每个问题都有相应的开始和结束时间戳。

An example dataframe containing this data would be: 包含此数据的示例数据框为:

           Created            Resolved           Summary
9  2016-04-25 10:29:00 2016-04-26 13:22:00  1 Blade Missing
10 2016-04-25 10:10:00 2016-04-25 10:23:00  Blade in Lockdown

Any other suggestions as to best represent this data in a time appropriate way is recommended. 建议采用其他任何建议,以便在适当的时间最好地表示此数据。

Thank you, 谢谢,

I think you don't need a bar plot, because it is used for visualizing relative distribution of categorical data. 我认为您不需要条形图,因为它用于可视化分类数据的相对分布。 One solution could be using the following approach. 一种解决方案可以使用以下方法。 Lets suppose that we have your test data in csv format. 假设我们有csv格式的测试数据。

In [1]: import pandas as pd
        import matplotlib.pyplot as plt
        df = pd.read_csv("df.txt", parse_dates = ["Created", "Resolved"], index_col = "Summary")
        df = df.stack().reset_index().rename(columns={0:"date"}).set_index("date")[["Summary"]]
        df = pd.get_dummies(df).applymap(lambda x:x if x else pd.np.nan)
        for n, col in enumerate(df.columns): df[col] = df[col]*n
        df.plot(lw=10, legend=False)
        plt.yticks(pd.np.arange(len(df.columns)), df.columns)
        plt.tight_layout()
        plt.show()

Basically, what the code above does is convert the "Created and Resolved" columns in the index of a new dataframe, then assign numerical values to each event when occurs or NaN if doesn't. 基本上,上面的代码所做的是转换新数据帧索引中的“已创建和已解决”列,然后为每个事件(如果不发生)分配数值,否则为NaN。 The result dataframe is: 结果数据帧为:

In [2]: df
Out[2]: 
                     Summary_1 Blade Missing  Summary_Blade in Lockdown
date                                                                   
2016-04-25 10:29:00                      0.0                        NaN
2016-04-26 13:22:00                      0.0                        NaN
2016-04-25 10:10:00                      NaN                        1.0
2016-04-25 10:23:00                      NaN                        1.0

And the result plot: 和结果图:

在此处输入图片说明

I hope this can help you. 希望对您有所帮助。 Regards. 问候。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM