Consider the following DataFrame df
:
Date Kind
2018-09-01 13:15:32 Red
2018-09-02 16:13:26 Blue
2018-09-04 22:10:09 Blue
2018-09-04 09:55:30 Red
... ...
In which you have a column with a datetime64[ns]
dtype and another which contains a np.object
which can assume only a finite number of values (in this case, 2).
You have to plot a date histogram in which you have:
How is it possible to achieve this using Matplotlib?
I was thinking to do a set_index and resample as follows:
df.set_index('Date', inplace=True)
df.resample('1d').count()
But I'm losing the information on the number of items per Kind . I also want to keep any missing day as zero.
Any help very appreciated.
Use groupby
, count
and unstack
to adjust the dataframe:
df2 = df.groupby(['Date', 'Kind'])['Kind'].count().unstack('Kind').fillna(0)
Next, re-sample the dataframe and sum the count for each day. This will also add any missing days that are not in the dataframe (as specified). Then adjust the index to only keep the date part.
df2 = df2.resample('D').sum()
df2.index = df2.index.date
Now plot the dataframe with stacked=True
:
df2.plot(kind='bar', stacked=True)
Alternatively, the plt.bar()
function can be used for the final plotting:
cols = df['Kind'].unique() # Find all original values in the column
ind = range(len(df2))
p1 = plt.bar(ind, df2[cols[0]])
p2 = plt.bar(ind, df2[cols[1]], bottom=df2[cols[0]])
Here it is necessary to set the bottom
argument of each part to be the sum of all the parts that came before.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.