I have the following data:
product Sales_band Hour_id sales
prod_1 HIGH 1 200
prod_1 HIGH 3 100
prod_1 HIGH 4 300
prod_1 VERY HIGH 2 100
prod_1 VERY HIGH 5 253
prod_1 VERY HIGH 6 234
want to add rows based on the hour_id value. hour_id variable can take values from 1 to 10 . So the same data above will be expanded where the hour ids are missing. Dummy output is :( sales = 0 when missing hour id )
product Sales_band Hour_id sales
prod_1 HIGH 1 200
prod_1 HIGH 2 0
prod_1 HIGH 3 100
prod_1 HIGH 4 300
prod_1 HIGH 5 0
prod_1 HIGH 6 0
prod_1 HIGH 7 0
prod_1 HIGH 8 0
prod_1 HIGH 9 0
prod_1 HIGH 10 0
prod_1 VERY HIGH 1 0
prod_1 VERY HIGH 2 100
prod_1 VERY HIGH 3 0
prod_1 VERY HIGH 4 0
prod_1 VERY HIGH 5 253
prod_1 VERY HIGH 6 234
prod_1 VERY HIGH 7 0
prod_1 VERY HIGH 8 0
prod_1 VERY HIGH 9 0
prod_1 VERY HIGH 10 0
how can I achieve this using python dataframe.
print (df.groupby(['product','Sales_band'])['Hour_id','sales']
.apply(lambda x: x.set_index('Hour_id').reindex(range(1, 11), fill_value=0))
.reset_index())
product Sales_band Hour_id sales
0 prod_1 HIGH 1 200
1 prod_1 HIGH 2 0
2 prod_1 HIGH 3 100
3 prod_1 HIGH 4 300
4 prod_1 HIGH 5 0
5 prod_1 HIGH 6 0
6 prod_1 HIGH 7 0
7 prod_1 HIGH 8 0
8 prod_1 HIGH 9 0
9 prod_1 HIGH 10 0
10 prod_1 VERY HIGH 1 0
11 prod_1 VERY HIGH 2 100
12 prod_1 VERY HIGH 3 0
13 prod_1 VERY HIGH 4 0
14 prod_1 VERY HIGH 5 253
15 prod_1 VERY HIGH 6 234
16 prod_1 VERY HIGH 7 0
17 prod_1 VERY HIGH 8 0
18 prod_1 VERY HIGH 9 0
19 prod_1 VERY HIGH 10 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.