I am trying to achieve the following plot or table with pandas:
Here is my data (number will not add up to what's on the photo):
TIME_COL TXT_COL
0 1/2/2017 text
1 1/3/2017 text
2 1/5/2017 text
3 1/2/2017 text
4 7/2/2017 text
5 12/2/2017 text
6 9/2/2017 text
Can anyone help me with the following: 1. What is the correct way to arrange/reshape my data? 2. How to approach the visual aspect of it, in order to achieve the same or similar result as the one shown in the photo?
I already have code which helps me group my data per month, but this is not exactly what I am looking for. Here is my grouping code:
df = pd.read_csv('some_file.csv')
df = df[['TIME_COL', 'TXT_COL']]
df['TIME_COL'] = pd.to_datetime(df['TIME_COL'])
df.index = pd.to_datetime(df['TIME_COL'], unit='s') # convert to datetime index
df = d['TXT_COL'].resample('M', how='count')
print out something in the form of:
TIME_COL
2016-09-30 5
2016-10-31 7
2016-11-30 0
2016-12-31 2
2017-01-31 5
2017-02-28 2
2017-03-31 11
2017-04-30 10
2017-05-31 10
2017-06-30 7
2017-07-31 7
2017-08-31 8
2017-09-30 6
2017-10-31 7
2017-11-30 2
2017-12-31 4
2018-01-31 7
Thanks!
IIUC, you could do something like this:
df['Year'] = df.index.year
df['Month'] = df.index.strftime('%b')
df.pivot_table('TIME_COL','Year','Month', aggfunc='mean', fill_value=0).style.bar(axis=1)
One way to get ordering on months is to add a secondary columns level and allow pivot_table to sort then drop that level afterwards like this.
df['Year'] = df.index.year
df['Month'] = df.index.strftime('%b')
df['MonthNo'] = df.index.month
df_pvt = df.pivot_table(values='TIME_COL',index='Year',columns=['MonthNo','Month'], aggfunc='mean', fill_value=0)
df_pvt.columns = df_pvt.columns.droplevel(0)
df_pvt.style.bar(axis=1)
Updated to add Total column.
df['Year'] = df.index.year
df['Month'] = df.index.strftime('%b')
df['MonthNo'] = df.index.month
df_pvt = df.pivot_table(values='TIME_COL',index='Year',columns=['MonthNo','Month'], aggfunc='mean', fill_value=0)
df_pvt.columns = df_pvt.columns.droplevel(0)
df_pvt = pd.concat([df_pvt,df_pvt.sum(1).rename('Total')],axis=1)
df_pvt.style.bar(axis=1,subset=df_pvt.columns[:-1])
Fake returns:
range_ = pd.date_range(start='2015-01-01', end='2017-12-31', freq='D')
df = pd.DataFrame({'returns': np.random.randn(len(range_))}, index=range_)
Add year and month columns:
df['year'] = df.index.year
df['month'] = df.index.month
monthly_returns = df.groupby(['year', 'month']).sum()
monthly_returns.unstack()
This is going to give you a table like:
month 1 2 3 4 5 6 7 8 9 10 11 12
year
2015 -4.2 4.7 2.5 4.9 4.4 6.9 -2.5 8.8 5.5 0.5 -5.5 -1.6
2016 10.5 1.1 1.6 1.0 9.9 0.2 -0.1 2.1 4.3 -1.5 10.8 2.5
2017 2.8 -9.8 4.9 7.4 14.8 2.5 -6.2 4.1 -0.9 0.3 7.4 1.0
Then you can plot it using:
import matplotlib.pyplot as plt
plt.imshow(you_matrix_of_returns, cmap='hot', interpolation='nearest')
plt.show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.