Pandas pivot_table：過濾聚合 function

Question

我正在嘗試將標准傳遞給聚合 function 到 pandas pivot_table，但我無法弄清楚如何將標准傳遞給 aggfunc。 我有一個轉換為 pandas df 的數據表。

輸入表數據：

col1	col2	col3	col4	col5	col6	col7
1	測試1	t1	假人1	結果1	10	102.2
2	測試1	t1	假人2	結果2	20	101.2
3	測試1	t1	假人3	結果3	30	102.3
4	測試1	t1	假人4	結果4	40	101.4
5	測試2	t1	假人1	結果1	10	100
6	測試2	t1	假人2	結果2	20	103
7	測試2	t1	假人3	結果3	30	104
8	測試2	t1	假人4	結果4	40	105
9	測試3	t1	假人1	結果1	10	102
10	測試3	t1	假人2	結果2	20	87
11	測試3	t1	假人3	結果3	30	107
12	測試3	t1	假人5	結果4	50	110.2
13	測試4	t1	假人2	結果2	20	120
14	測試5	t1	假人6	結果1	100	88
15	測試1	t1	假人1	結果2	10	106.2
16	測試1	t1	假人1	結果6	10	101.1

我想獲得 col7 數據的最大值，但僅當最大值大於 100 時。如果任何 col7 數據大於用戶定義的標准，則需要填充所有其他列數據，無論數據是否滿足標准與否。

嘗試了以下方法：

columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']

df = pd.DataFrame({
    'col1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
    'col2': ['test1', 'test1', 'test1', 'test1', 'test2', 'test2', 'test2',
             'test2', 'test3', 'test3', 'test3', 'test3', 'test4', 'test5',
             'test1', 'test1'],
    'col3': ['t1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1',
             't1', 't1', 't1', 't1', 't1'],
    'col4': ['Dummy1', 'Dummy2', 'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2',
             'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2', 'Dummy3', 'Dummy5',
             'Dummy2', 'Dummy6', 'Dummy1', 'Dummy1'],
    'col5': ['result1', 'result2', 'result3', 'result4', 'result1', 'result2',
             'result3', 'result4', 'result1', 'result2', 'result3', 'result4',
             'result2', 'result1', 'result2', 'result6'],
    'col6': [10, 20, 30, 40, 10, 20, 30, 40, 10, 20, 30, 50, 20, 100, 10, 10],
    'col7': [100.2, 101.2, 102.3, 101.4, 100.0, 103.0, 104.0, 105.0, 102.0,
             87.0, 107.0, 110.2, 120.0, 88.0, 106.2, 101.1]
})

res=df.pivot_table(values = 'col7', index = ['col4', 'col5', 'col6'], columns = ['col2'], fill_value = '', aggfunc = 'max' >= 100)

TypeError: '>=' not supported between instances of 'str' and 'int'

Output 應如下所示：

不帶 col5 的最大旋轉 output：

col4	col6	測試1	測試2	測試3	測試4	測試5
假人1	10	106.2	100	102	不適用	不適用
假人2	20	101.2	103	87	120	不適用
假人3	30	102.3	104	107	不適用	不適用
假人4	40	101.4	105	不適用	不適用	不適用
假人5	50	不適用	不適用	110.2	不適用	不適用

最大旋轉 output 包括 col5：

col4	col5	col6	測試1	測試2	測試3	測試4	測試5
假人1	結果2	10	106.2	不適用	不適用	不適用	不適用
假人1	結果1	10	102.2	100	102	不適用	不適用
假人2	結果2	20	101.2	103	87	120	不適用
假人3	結果3	30	102.3	104	107	不適用	不適用
假人4	結果4	40	101.4	105	不適用	不適用	不適用
假人5	結果4	50	不適用	不適用	110.2	不適用	不適用

非常感謝任何指導。

謝謝

Answer 1

您無法通過>= ( aggfunc = 'max' >= 100 ) 將單詞 'max' 與 100 進行比較：

我建議不要將填充值設置為字符串，屏蔽 DataFrame，以消除不需要的行，然后通過fillna替換為空字符串：

columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
res = df.pivot_table(values='col7', index=['col4', 'col5', 'col6'],
                     columns=['col2'], aggfunc='max')

col2                 test1  test2  test3  test4  test5
col4   col5    col6                                   
Dummy1 result1 10    102.2  100.0  102.0    NaN    NaN
       result2 10    106.2    NaN    NaN    NaN    NaN
       result6 10    101.1    NaN    NaN    NaN    NaN
Dummy2 result2 20    101.2  103.0   87.0  120.0    NaN
Dummy3 result3 30    102.3  104.0  107.0    NaN    NaN
Dummy4 result4 40    101.4  105.0    NaN    NaN    NaN
Dummy5 result4 50      NaN    NaN  110.2    NaN    NaN
Dummy6 result1 100     NaN    NaN    NaN    NaN   88.0

any res >= 100和fillna的任何值的掩碼：

res = res[(res >= 100).any(1)].fillna('')

col2                 test1  test2  test3  test4 test5
col4   col5    col6                                  
Dummy1 result1 10    102.2  100.0  102.0             
       result2 10    106.2                           
       result6 10    101.1                           
Dummy2 result2 20    101.2  103.0   87.0  120.0      
Dummy3 result3 30    102.3  104.0  107.0             
Dummy4 result4 40    101.4  105.0                    
Dummy5 result4 50                  110.2

可選的reset_index清除 MultiIndex 和rename_axis清除軸名稱：

res[(res >= 100).any(1)].fillna('').reset_index().rename_axis(None, axis=1)

     col4     col5  col6  test1  test2  test3  test4 test5
0  Dummy1  result1    10  102.2  100.0  102.0             
1  Dummy1  result2    10  106.2                           
2  Dummy1  result6    10  101.1                           
3  Dummy2  result2    20  101.2  103.0   87.0  120.0      
4  Dummy3  result3    30  102.3  104.0  107.0             
5  Dummy4  result4    40  101.4  105.0                    
6  Dummy5  result4    50                110.2

完整的工作示例：

import pandas as pd

df = pd.DataFrame({
    'col1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
    'col2': ['test1', 'test1', 'test1', 'test1', 'test2', 'test2', 'test2',
             'test2', 'test3', 'test3', 'test3', 'test3', 'test4', 'test5',
             'test1', 'test1'],
    'col3': ['t1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1',
             't1', 't1', 't1', 't1', 't1'],
    'col4': ['Dummy1', 'Dummy2', 'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2',
             'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2', 'Dummy3', 'Dummy5',
             'Dummy2', 'Dummy6', 'Dummy1', 'Dummy1'],
    'col5': ['result1', 'result2', 'result3', 'result4', 'result1', 'result2',
             'result3', 'result4', 'result1', 'result2', 'result3', 'result4',
             'result2', 'result1', 'result2', 'result6'],
    'col6': [10, 20, 30, 40, 10, 20, 30, 40, 10, 20, 30, 50, 20, 100, 10, 10],
    'col7': [102.2, 101.2, 102.3, 101.4, 100.0, 103.0, 104.0, 105.0, 102.0,
             87.0, 107.0, 110.2, 120.0, 88.0, 106.2, 101.1]
})

columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
res = df.pivot_table(values='col7', index=['col4', 'col5', 'col6'],
                     columns=['col2'], aggfunc='max')
res = (
    res[(res >= 100).any(1)].fillna('').reset_index().rename_axis(None, axis=1)
)
print(res)

要獲取不帶 col5 的值，請將其從pivot_table的index中刪除：

columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
res = df.pivot_table(values='col7', index=['col4', 'col6'],
                     columns=['col2'], aggfunc='max')
res = (
    res[(res >= 100).any(1)].fillna('').reset_index().rename_axis(None, axis=1)
)

     col4  col6  test1  test2  test3  test4 test5
0  Dummy1    10  106.2  100.0  102.0             
1  Dummy2    20  101.2  103.0   87.0  120.0      
2  Dummy3    30  102.3  104.0  107.0             
3  Dummy4    40  101.4  105.0                    
4  Dummy5    50                110.2

Answer 2

或者您可以嘗試：

res = df.assign(col7 = df.col7.where(df.col7 > 100)).pivot_table(values='col7', index=['col4', 'col5', 'col6'],
                     columns=['col2'], aggfunc='max', fill_value= '')

Pandas pivot_table：過濾聚合 function

問題描述

2 個解決方案

解決方案1
1 已采納 2021-06-06 19:30:27

解決方案2
0 2021-06-06 19:48:40

Pandas pivot_table：過濾聚合 function

問題描述

2 個解決方案

解決方案1 1 已采納 2021-06-06 19:30:27

解決方案2 0 2021-06-06 19:48:40

解決方案1
1 已采納 2021-06-06 19:30:27

解決方案2
0 2021-06-06 19:48:40