简体   繁体   中英

Pandas: In pivot_table, how to custom fill missing values?

I want to fill the missing values in my Pandas pivot_table with values from the index and to fill the missing Year Week columns.

import pandas as pd

d = { 'Year': [2019,2019,2019,2019,2019,2019],
        'Week': [1,2,3,4,5,6],
        'Part': ['A','A','A','B','B','B'],
        'Static': [20,20,20,40,40,40],
        'Value': [np.nan,10,50,np.nan,30,np.nan]
}

df = pd.DataFrame(d)

pivot = df.pivot_table(index=['Part','Static'], columns=['Year', 'Week'], values=['Value'])

print(pivot)

            Value
Year         2019
Week            2     3     5
Part Static
A    20      10.0  50.0   NaN
B    40       NaN   NaN  30.0

In the example above, the Weeks 1, 4 & 6 are missing because they don't have values. As for the NaN , I want to fill them with a value from the "left", so for Week 1 for Part A the value will be 20.0, and for Week 4 to 6 will be 50.0, and the same for Part B where all NaN will be filled with values from the left.

The expected output is

            Value
Year         2019
Week           1     2     3     4     5     6
Part Static
A    20     20.0  10.0  50.0  50.0  50.0  50.0
B    40     40.0  40.0  40.0  40.0  30.0  30.0

PS: I can refer to a reference calendar dataframe to pull in all the Year Week values.

Edit:

I tested the solution on my data, but it seems to not work. Here is an updated data with Week 4 being removed.

d = { 'Year': [2019,2019,2019,2019,2019],
        'Week': [1,2,3,5,6],
        'Part': ['A','A','A','B','B'],
        'Static': [20,20,20,40,40],
        'Value': [np.nan,10,50,30,np.nan]
   }

df = pd.DataFrame(d)

#Year Week data set for reference
d2 = {'Year':[2019,2019,2019,2019,2019,2019,2019,2019,2019,2019],
        'Week':[1,2,3,4,5,6,7,8,9,10] }

fill the column Value , first filling down the column, and then filling across the with the Static value

df.Value = df.groupby('Part')[['Static', 'Value']].ffill().ffill(axis=1).Value

After this operation, the Value column has an object type. So it is necessary to cast as int .

df.Value = df.Value.astype('int')

Then, pivot as usual, but also ffill & bfill after on the horizontal axis

df.pivot_table(index=['Part','Static'], columns=['Year', 'Week'], values=['Value']).ffill(axis=1).bfill(axis=1)

# outputs:

            Value
Year         2019
Week            1     2     3     4     5     6
Part Static
A    20      20.0  10.0  50.0  50.0  50.0  50.0
B    40      40.0  40.0  40.0  40.0  30.0  30.0

unstack reset_index and fillna is one option:

df.set_index(['Year','Week', 'Part', 'Static']).unstack([0,1]).reset_index().fillna(method='ffill', axis=1)

     Part Static Value                    
Year              2019                    
Week                 1   2   3   4   5   6
0       A     20    20  10  50  50  50  50
1       B     40    40  40  40  40  30  30

fillna with methond='ffill' will forward fill data so when you set axis=1 it forward fills left to right.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM