简体   繁体   中英

Plot histogram / curve on time axis

I have a feeling there is a very simple way of doing this. I'm trying to plot a timeline of a tasks running on an an environment, incl. two plots on the same diagram:

  1. the task run-times as a broken_barh
  2. an overall load curve based on the aggregate of tasks on each time-point (or a histogram), let's say with lower opacity or a line.

In the example there were 6 tasks running (AF), for various lengths, with different start times. They are plotted exactly as I need (1/), in a gant-like chart, time on the X axis.

import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib as mpl
from matplotlib import pyplot as plt

cols=['ID','From','To']

df = pd.DataFrame([['A', 736758.993, 736758.995], ['B', 736758.995, 736758.998],
                   ['C', 736758.994, 736758.996], ['D', 736758.996, 736758.997],
                   ['E', 736758.996, 736758.997], ['F', 736758.995, 736758.996]],
                   columns=cols)

df['Diff'] = df['To']-df['From']

fig,ax=plt.subplots()
for i, slice in df.iterrows():
    values = [[slice['From'], slice['Diff']]]
    ax.broken_barh((values), (i-0.4,0.8), color=np.random.rand(3))

ax.xaxis_date()

To this I would like to add 2/ a curve, showing the active task count at each time (1 between 23:51-23:52, 2 for 23:52-53 etc., peaking around 23:54)

The problem with this is that I cannot just draw a histogram of the start times, since the different task overlap in time. Do you know a decent way to create such histogram?

I am pretty sure there are cleaner ways to approach this. Especially the float math problems were pretty annoying, when trying to create the histogram. The first part is a simple one liner, though. Just use, as suggested, hlines and increase the linewidth to create your bar chart.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

df = pd.DataFrame([['A', 736758.993, 736758.995], ['B', 736758.995, 736758.998],
                   ['C', 736758.994, 736758.996], ['D', 736758.994, 736758.997],
                   ['E', 736758.997, 736758.998], ['F', 736758.995, 736758.999]],
                   columns = ['ID','From','To'])

#create two subplots with shared x axis
fig, (ax1, ax2) = plt.subplots(2, 1, sharex = True)
#plot1 - Gantt chart for individual IDs
ax1.hlines(df.ID, df.From, df.To, colors = cm.inferno(df.index/len(df)), linewidth = 20)

#plot 2 - make table of time series for each ID - multiply by 1000 to avoid float problems
hist_count = df.apply(lambda row: pd.Series(np.arange(1000 * row["From"], 1000 * row["To"])), axis = 1)
hist_count = pd.melt(hist_count)["value"].dropna().astype(int)
#find borders for bins 
min_time = hist_count.min(axis = 0)
max_time = hist_count.max(axis = 0)
#plot 2 histogram - add 0.0001 to prevent arbitrary binning due to float problems
ax2.hist(hist_count / 1000 + 0.0001, range = (min_time / 1000, (max_time + 1) / 1000), bins = max_time - min_time + 1)
ax2.xaxis_date()

plt.show()

Output from sample data set: 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM