[英]Add rows for missing data grouped by another column in Pandas DataFrame
I have a Pandas dataframe where for certain dates
certain products
are missing.我有一个 Pandas dataframe 在某些
dates
缺少某些products
。 I want to add those rows to the dataframe and assign them a sales
value of 0. How can I do that?我想将这些行添加到 dataframe 并为其分配
sales
价值 0。我该怎么做?
# Sample dataframe
import pandas as pd
df = pd.DataFrame({
'date': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03'],
'product': ['glass', 'clothes', 'food', 'glass', 'food', 'glass', 'clothes'],
'sales': [100, 120, 50, 90, 60, 110, 130]
})
date product sales
0 2020-01-01 glass 100
1 2020-01-01 clothes 120
2 2020-01-01 food 50
3 2020-01-02 glass 90
4 2020-01-02 food 60
5 2020-01-03 glass 110
6 2020-01-03 clothes 130
## 'clothes' is missing for 2020-01-02 and 'food' is missing for 2020-01-03
## What I want to get:
date product sales
0 2020-01-01 glass 100
1 2020-01-01 clothes 120
2 2020-01-01 food 50
3 2020-01-02 glass 90
4 2020-01-02 clothes 0
5 2020-01-02 food 60
6 2020-01-03 glass 110
7 2020-01-03 clothes 130
8 2020-01-03 food 0
You can do with unstack()/stack()
:您可以使用
unstack()/stack()
:
(df.set_index(['date','product'])
.unstack(fill_value=0)
.stack()
.reset_index()
)
Output: Output:
date product sales
0 2020-01-01 clothes 120
1 2020-01-01 food 50
2 2020-01-01 glass 100
3 2020-01-02 clothes 0
4 2020-01-02 food 60
5 2020-01-02 glass 90
6 2020-01-03 clothes 130
7 2020-01-03 food 0
8 2020-01-03 glass 110
Try with pivot
尝试使用
pivot
df=df.pivot(*df.columns).fillna(0).stack().to_frame('sales').reset_index()
df
Out[120]:
date product sales
0 2020-01-01 clothes 120.0
1 2020-01-01 food 50.0
2 2020-01-01 glass 100.0
3 2020-01-02 clothes 0.0
4 2020-01-02 food 60.0
5 2020-01-02 glass 90.0
6 2020-01-03 clothes 130.0
7 2020-01-03 food 0.0
8 2020-01-03 glass 110.0
Use set_index
with reindex
:将
set_index
与reindex
一起使用:
(df.set_index(['date', 'product'])
.reindex(pd.MultiIndex.from_product([df['date'].unique(),
df['product'].unique()],
names=['date', 'product']),
fill_value=0)
.reset_index())
Output: Output:
date product sales
0 2020-01-01 glass 100
1 2020-01-01 clothes 120
2 2020-01-01 food 50
3 2020-01-02 glass 90
4 2020-01-02 clothes 0
5 2020-01-02 food 60
6 2020-01-03 glass 110
7 2020-01-03 clothes 130
8 2020-01-03 food 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.