简体   繁体   中英

Python plotting on/off data using Matplotlib

I'm trying to plot data about a bunch of devices whether they're online or offline. The devices give a signal 1 when they come online and a signal 0 when they're going offline. In between, there's no data.

For just one device I use a step plot (with step=post), which works pretty well. Now I want to show by a line when one or more devices are online.

Does anyone have any tips/tricks on how to visualize this dataset? I've tried adding extra rows just before each signal to get a more continuous dataset and then plot the value of OnOff, but then I lose the categories. Do I need to convert this to a broken_barh plot? Or any other ideas?

示例图

Data:

import pandas as pd 
import matplotlib.pyplot as plt

TESTDATA = u"""\
Index;OnOff;Device
12-10-2021 10:04:04;1;device1
12-10-2021 10:04:12;0;device3
12-10-2021 10:05:05;1;device2
12-10-2021 19:05:11;0;device2
13-10-2021 05:25:17;1;device2
13-10-2021 19:26:22;0;device2
14-10-2021 15:44:44;1;device2
14-10-2021 20:54:12;0;device2
15-10-2021 04:21:42;1;device2
15-10-2021 09:15:11;0;device2
15-10-2021 17:05:05;0;device1
15-10-2021 17:05:25;1;device3
15-10-2021 17:56:45;1;device1
15-10-2021 17:57:09;1;device2
15-10-2021 21:10:20;0;device2
16-10-2021 01:51:50;1;device2
19-10-2021 10:00:13;0;device1
19-10-2021 10:04:19;0;device2
"""

df = pd.read_csv(StringIO(TESTDATA), index_col=0, sep=';', engine='python')
df.index = pd.to_datetime(df.index, format='%d-%m-%Y %H:%M:%S')
print(df)

# plot
fig, ax = plt.subplots(figsize=[16,9])

devices = list(set(df['Device']))
devices.sort(reverse=True)

for device in devices:
    ax.plot(df.index[df['Device'] == device], df['Device'][df['Device'] == device], label=device)
plt.show()

The problem is in the ax.plot params. ax.plot requires x and y, eg ax.plot(x, y) your x, y are: x - df.index[df['Device'] == device] - this is correct y - df['Device'][df['Device'] == device - this is not correct

change df['Device'][df['Device'] == device to df.loc[df['Device'] == device, 'OnOff']

df.loc works by filtering rows and then columns:

df.loc[row_filter, column_filter]
row_filter = df['Device'] == device # give me all rows whre 'Device' column's value == device variable value
column_filter = 'OnOff' # give me just the OnOff column

The graph you will see may not be what you want. 在此处输入图像描述

You may want to replace the ax.plot with ax.step to see the below, but the data will overlap and won't be too redable: 在此处输入图像描述

The final solution may be to draw 3 axes, 1 for each device on shared x axis:

# plot
fig, axs = plt.subplots(3,1, figsize=[16,9], sharex=True)

devices = list(set(df['Device']))
devices.sort(reverse=True)

for device_idx, device in enumerate(devices):
    axs[device_idx].step(df.index[df['Device'] == device], df.loc[df['Device'] == device, 'OnOff'] , label=device )   

在此处输入图像描述

Datetime objects are indeed difficult in their behavior as not all pandas/numpy/matplotlib functions accept all versions or might interpret them differently. However, we can convert datetimes into matplotlib dates , which are numerical data, making our life easier (kinda):

import pandas as pd 
import matplotlib.pyplot as plt
from matplotlib.dates import date2num

#test data generation
from io import StringIO
TESTDATA = u"""\
Index;OnOff;Device
12-10-2021 10:04:04;1;device1
12-10-2021 10:04:12;0;device3
12-10-2021 10:05:05;1;device2
12-10-2021 19:05:11;0;device2
13-10-2021 05:25:17;1;device2
13-10-2021 19:26:22;0;device2
14-10-2021 15:44:44;1;device2
14-10-2021 20:54:12;0;device2
15-10-2021 04:21:42;1;device2
15-10-2021 09:15:11;0;device2
15-10-2021 17:05:05;0;device1
15-10-2021 17:05:25;1;device3
15-10-2021 17:56:45;1;device1
15-10-2021 17:57:09;1;device2
15-10-2021 21:10:20;0;device2
16-10-2021 01:51:50;1;device2
17-10-2021 10:00:13;0;device1
19-10-2021 10:04:19;0;device2
"""
df = pd.read_csv(StringIO(TESTDATA), index_col=0, sep=';', engine='python')
df.index = pd.to_datetime(df.index, format='%d-%m-%Y %H:%M:%S')

#not necessary if presorted but we don't want to push our luck
df = df.sort_index()

fig, ax = plt.subplots(figsize=(10, 4))

#convert dates into matplotlib format
df["Mpl_date"] = df.index.map(date2num)
#group by device
gb = df.groupby("Device", sort="False")    

for pos, (_device_name, device_df) in enumerate(gb):  
    #make sure the entire datetime range is covered for each device
    prepend_df = (None, df.iloc[0].to_frame().T)[int((device_df.iloc[0] != df.iloc[0]).any())]
    append_df = (None, df.iloc[-1].to_frame().T)[int((device_df.iloc[-1] != df.iloc[-1]).any())]
    device_df = pd.concat([prepend_df, device_df, append_df])  
    device_df["OnOff"].iloc[[0, -1]] = 1 - device_df["OnOff"].iloc[[1, -2]] 
    #calculate time differences as broken_barh expects a list of tuples (x_start, x_width)
    device_df["Mpl_diff"] = -device_df["Mpl_date"].diff(-1)

    #and plot each broken barh, starting with the first status 1
    ax.broken_barh(list(zip(device_df["Mpl_date"], device_df["Mpl_diff"]))[1- device_df["OnOff"].iloc[0]::2], (pos-0.1, 0.2))

ax.set_yticks(range(gb.ngroups))
ax.set_yticklabels(gb.groups.keys()) 
ax.xaxis_date()
ax.set_xlabel("Date")
plt.tight_layout()       
plt.show()

Sample output: 在此处输入图像描述

Most of the code is just necessary to take care of cases where the status at the beginning or end is not explicitly declared by the original dataframe. That the status is 1 and 0, however, makes the coding easier as it can be directly translated into indexes.

PS The first bar of device 3 is visible in the original plot but not in the downsampled image stored here on SO.

# first I would assume that all devices have to start from the unknow state, instead of assuming they are off, 
# thus lets add one row at the begining
new_index_first_element = df.index[0]-pd.Timedelta(seconds=1)
new_index = [new_index_first_element] + df.index.to_list()

devices = sorted(df.Device.unique())

# lets create a new dataframe where each device will have its own column and
# each entry will track the state of each device
df2 = pd.DataFrame(index = new_index, columns=devices) 

for i_iloc in range(1,len(df2)): # i have to be able to reffer to previous row, thus I will go with iloc, instead of loc
    # first copy previous status of all devices to current row
    df2.iloc[i_iloc] = df2.iloc[i_iloc-1]
    
    # now lets update the status for device that changed
    current_row_idx = df2.iloc[[i_iloc]].index
    device_to_update = df.loc[current_row_idx, 'Device']
    status_to_update = df.loc[current_row_idx, 'OnOff']
    df2.at[current_row_idx, device_to_update] = status_to_update

df2

在此处输入图像描述

This is how the DF will look like, it has an additional row with NaNs as we do not know what the status of those devices are.

# and plot
fig, ax = plt.subplots(figsize=[16,9])
df2.plot(kind='bar', stacked=True, color=['red', 'skyblue', 'green'], ax=ax)

在此处输入图像描述

I dont think that plotting a 'broken_barh plot' will do a good job here, this stacked barplot will be way better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM