简体   繁体   中英

Plotting categorical data settings over time in Python

I'm having trouble generating a plot comprised of various settings over time using matplotlib. I would like to present the appearance of a stacked horizontal bar chart, though the data is categorical.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Setting1':['A','A','C'],'Setting2':['B','B','B'],'Setting3':
['D','D','C'],'TimeStr':['2021-06-12 13:00:00','2021-06-12 13:00:01','2021-06-12 13:00:02']})
df['TimeStr'] = pd.to_datetime(df['TimeStr'])
fig,ax = plt.subplots()
plt.barh(df['Setting1'],df['TimeStr'])
plt.barh(df['Setting2'],df['TimeStr'])
plt.barh(df['Setting3'],df['TimeStr'])
plt.show()

The desired output would look something like this:

         |-------------------------
Setting3 |         D       |  C   |
         |-------------------------
         |-------------------------
Setting2 |           B            |
         |-------------------------
         |-------------------------
Setting1 |       A         | C    |
         |-------------------------
         |____________________________
                     Time

Currently my y axis is getting set to A, B, C and D rather than the settings variables. Is there a way to achieve this using matplotlib?

Lots of ways to do this, here is one implementation. Your dataframe isn't really represented in a way that is conducive to bar plots. Usually with bar plots you have one column per SET of bars. Then it is easy to stack bars by indicating the left position they should start at. You may need to use a groupby method to get the counts for your categorical variables depending on how your data are structured. Here is a really nice groupby tutorial to help with that. If you need more specific time measurements, you could use pd.TimeDelta as your dataframe values. Here is a nice tutorial on stacked bar plots with matplotlib.

在此处输入图片说明

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib as mpl
sns.set(style='white') #define color set

#make dataframe
df = pd.DataFrame({
    'Setting' : ['S1', 'S2', 'S3'], 
    'A' : [0, 0, 1],
    'B' : [0, 3, 0],
    'C' : [1, 0, 1],
    'D' : [2, 0, 0]
    })

fig, ax = plt.subplots(figsize=(12,5))
bars = ['A', 'B', 'C', 'D'] #order we will plot the bars
left = 0                    #will indicate left starting points for next set of bars

#for each index in bars (A, B, C, D), e.g. for each column
#make a horizontal bar plot with the left part of the plot
#starting at LEFT variable, increase the left variable by the current set of bars
for i in range(len(bars)):
    ax.barh(
        y = np.arange(len(df['A'])) / 1.2, #divide by 1.2 to scale down the y axis
        width = df[bars[i]].values, 
        left = left, 
        label = bars[i], 
        height = 0.5
    )
    left += df[bars[i]]

#Add y_axis tick labels, FixedLocator, FixedFormatter, legend, x_label
#hide spines
labels = ['Setting 1', 'Setting 2', 'Setting 3']
ax.yaxis.set_major_locator(mpl.ticker.FixedLocator(np.arange(len(df['A'])) / 1.2))
ax.yaxis.set_major_formatter(mpl.ticker.FixedFormatter(labels))
legend = ax.legend(edgecolor='w', fontsize=14, ncol=4, bbox_to_anchor=(0.25, 1), loc='lower left')
spines = [ax.spines[x].set_visible(False) for x in ['top','right','bottom']]
x_label = ax.set_xlabel('Time for each setting')
ylim = ax.set_ylim(-0.3, 1.95)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM