Python Pandas - Groupby multiple columns, filter for certain value certain column, and fillna

Question

I have a large dataset with messy data. The data looks like this:

df1 = pd.DataFrame({'Batch':[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
                    'Case':[1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2],
                    'Live':['Yes', 'Yes', 'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
                    'Task':['Download', nan, 'Download', 'Report', 'Report', nan, 'Download', nan, nan, nan, 'Download', 'Download', 'Report', nan, 'Report']

    })

For the purpose of the example, please imagine that the 'nan' is actually an empty cell (not a string saying 'nan')

I need to group by 'Batch', then group by 'Case', filter for instances where 'Live' has the value 'Yes' then fill downwards.

I essentially want it to look something like this

My current approach has been:

df['Task'] = df.groupby(['Batch','Case'])['Live'].filter(lambda x: x == 'Yes')['Task'].fillna(method='ffill')

I've tried a number of variations, but I keep getting errors like "the filter must return a boolean result"

Does anyone know how I can go about doing this?

Answer 1

You do not need to filter , you can slice the Yes of live before groupby

df1.Task=df1.loc[df1.Live=='Yes'].groupby(['Batch','Case']).Task.ffill()
df1
Out[620]: 
    Batch  Case Live      Task
0       1     1  Yes  Download
1       1     1  Yes  Download
2       1     1   No       NaN
3       1     2  Yes    Report
4       1     2   No       NaN
5       1     2   No       NaN
6       1     2  Yes  Download
7       1     2  Yes  Download
8       1     2  Yes  Download
9       2     1  Yes       NaN
10      2     1  Yes  Download
11      2     1   No       NaN
12      2     2  Yes    Report
13      2     2  Yes    Report
14      2     2   No       NaN

Python Pandas - Groupby multiple columns, filter for certain value certain column, and fillna

Question

1 answers

solution1
1 2018-08-23 01:10:16

Python Pandas - Groupby multiple columns, filter for certain value certain column, and fillna

Question

1 answers

solution1 1 2018-08-23 01:10:16

solution1
1 2018-08-23 01:10:16