Python Pandas-將多列分組，對某些列的特定值進行過濾，並填寫

Question

我有一個包含混亂數據的大型數據集。 數據如下所示：

df1 = pd.DataFrame({'Batch':[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
                    'Case':[1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2],
                    'Live':['Yes', 'Yes', 'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
                    'Task':['Download', nan, 'Download', 'Report', 'Report', nan, 'Download', nan, nan, nan, 'Download', 'Download', 'Report', nan, 'Report']

    })

出於示例的目的，請想象'nan'實際上是一個空單元格（而不是一個表示'nan'的字符串）

我需要按“批次”分組，然后按“案例”分組，過濾“實時”值為“是”的實例，然后向下填充。

我本質上希望它看起來像這樣

我當前的方法是：

df['Task'] = df.groupby(['Batch','Case'])['Live'].filter(lambda x: x == 'Yes')['Task'].fillna(method='ffill')

我已經嘗試了多種變體，但不斷收到諸如“過濾器必須返回布爾結果”之類的錯誤

有人知道我該怎么做嗎？

Answer 1

您不需要filter ，可以在groupby之前切片實時的Yes

df1.Task=df1.loc[df1.Live=='Yes'].groupby(['Batch','Case']).Task.ffill()
df1
Out[620]: 
    Batch  Case Live      Task
0       1     1  Yes  Download
1       1     1  Yes  Download
2       1     1   No       NaN
3       1     2  Yes    Report
4       1     2   No       NaN
5       1     2   No       NaN
6       1     2  Yes  Download
7       1     2  Yes  Download
8       1     2  Yes  Download
9       2     1  Yes       NaN
10      2     1  Yes  Download
11      2     1   No       NaN
12      2     2  Yes    Report
13      2     2  Yes    Report
14      2     2   No       NaN

Python Pandas-將多列分組，對某些列的特定值進行過濾，並填寫

問題描述

1 個解決方案

解決方案1
1 2018-08-23 01:10:16

Python Pandas-將多列分組，對某些列的特定值進行過濾，並填寫

問題描述

1 個解決方案

解決方案1 1 2018-08-23 01:10:16

解決方案1
1 2018-08-23 01:10:16