简体   繁体   中英

Add rows for missing data grouped by another column in Pandas DataFrame

I have a Pandas dataframe where for certain dates certain products are missing. I want to add those rows to the dataframe and assign them a sales value of 0. How can I do that?

# Sample dataframe
import pandas as pd
df = pd.DataFrame({
    'date': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03'],
    'product': ['glass', 'clothes', 'food', 'glass', 'food', 'glass', 'clothes'],
    'sales': [100, 120, 50, 90, 60, 110, 130]
})

        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  food    60
5   2020-01-03  glass   110
6   2020-01-03  clothes 130

## 'clothes' is missing for 2020-01-02 and 'food' is missing for 2020-01-03
## What I want to get: 
        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  clothes 0
5   2020-01-02  food    60
6   2020-01-03  glass   110
7   2020-01-03  clothes 130
8   2020-01-03  food    0

You can do with unstack()/stack() :

(df.set_index(['date','product'])
   .unstack(fill_value=0)
   .stack()
   .reset_index()
)

Output:

         date  product  sales
0  2020-01-01  clothes    120
1  2020-01-01     food     50
2  2020-01-01    glass    100
3  2020-01-02  clothes      0
4  2020-01-02     food     60
5  2020-01-02    glass     90
6  2020-01-03  clothes    130
7  2020-01-03     food      0
8  2020-01-03    glass    110

Try with pivot

df=df.pivot(*df.columns).fillna(0).stack().to_frame('sales').reset_index()
df
Out[120]: 
         date  product  sales
0  2020-01-01  clothes  120.0
1  2020-01-01     food   50.0
2  2020-01-01    glass  100.0
3  2020-01-02  clothes    0.0
4  2020-01-02     food   60.0
5  2020-01-02    glass   90.0
6  2020-01-03  clothes  130.0
7  2020-01-03     food    0.0
8  2020-01-03    glass  110.0

Use set_index with reindex :

(df.set_index(['date', 'product'])
   .reindex(pd.MultiIndex.from_product([df['date'].unique(), 
                                        df['product'].unique()], 
                                       names=['date', 'product']), 
            fill_value=0)
   .reset_index())

Output:

         date  product  sales
0  2020-01-01    glass    100
1  2020-01-01  clothes    120
2  2020-01-01     food     50
3  2020-01-02    glass     90
4  2020-01-02  clothes      0
5  2020-01-02     food     60
6  2020-01-03    glass    110
7  2020-01-03  clothes    130
8  2020-01-03     food      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM