I have a list with some dates, eg: dates_list=[201701, 201702, 201703, 201704]. This is a user input of desired dates for a specific report.
And I have a database with three columns: id, date and value.
My database, sometimes, doesn't have records for all dates asked by user(eg: it has only records for 201701 and 201702). df is my database. I have this command:
raw = pd.pivot_table(df, index=['id'],
columns=['date'], values=['value'],
aggfunc=[np.sum], fill_value=0, margins=False)
Which, of course, will return a pivot table with only two columns: 201701 and 201702.
I want to know if it is possible to use dates_list as columns labels at pivot table construction, in order to return a column full of zeros for 201703 and 201704. If it is not possible, someone know the best approach for this problem?
Thanks in advance
Sample data:
df = pd.DataFrame({'id':[1,1,2,1,2],
'date': [201701,201701,201701,201702,201702],
'value': [0.04, 0.02, 0.07, 0.08, 1.0]})
df
date id value
0 201701 1 0.04
1 201701 1 0.02
2 201701 2 0.07
3 201702 1 0.08
4 201702 2 1.00
raw = pd.pivot_table(df, index=['id'], columns=['date'], values=['value'],
aggfunc=[np.sum], fill_value=0, margins=False)
sum
value
date 201701 201702
id
1 0.06 0.08
2 0.07 1.00
date_list = [201701, 201702, 201703, 201704]
raw.reindex(columns=date_list, fill_value=0)
And I got ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
You can do reindex
after pivot_table
pd.pivot_table(df, index=['id'],
columns=['date'], values=['value'],
aggfunc=[np.sum], fill_value=0, margins=False).\
reindex(columns=[yourlist],fill_value=0)
Update
pd.pivot_table(df, index='id', columns='date', values='value',aggfunc='sum', fill_value=0, margins=False).reindex(columns=[201701,201702,201703])
Out[115]:
date 201701 201702 201703
id
1 0.06 0.08 NaN
2 0.07 1.00 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.