Pandas Pivot 表：按條件過濾時出錯

Question

我有一個 dataframe，當值滿足特定條件時，我嘗試創建一個更新的 dataframe。 我遇到的問題是列中的值分為兩行。 需要在值的第 1 行進行比較。 例如，如果 col7 值為 '100.2\n11'，那么我需要將 100.2 與條件進行比較，如果它滿足條件，那么最終的 dataframe 應該包含數據的完整值（'100.2\n11'）並且不只是 100.2。

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
    'col2': ['test1', 'test1', 'test1', 'test1', 'test2', 'test2', 'test2',
             'test2', 'test3', 'test3', 'test3', 'test3', 'test4', 'test5',
             'test1', 'test1'],
    'col3': ['t1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1',
             't1', 't1', 't1', 't1', 't1'],
    'col4': ['input1', 'input2', 'input3', 'input4', 'input1', 'input2',
             'input3', 'input4', 'input1', 'input2', 'input3', 'input5',
             'input2', 'input6', 'input1', 'input1'],
    'col5': ['result1', 'result2', 'result3', 'result4', 'result1', 'result2',
             'result3', 'result4', 'result1', 'result2', 'result3', 'result4',
             'result2', 'result1', 'result2', 'result6'],
    'col6': [10, 20, 30, 40, 10, 20, 30, 40, 10, 20, 30, 50, 20, 100, 10, 10],
    'col7': ['100.2\n11','101.2\n21','102.3\n34','101.4\n41','100.0\n10','103.0\n20.6','104.0\n31.2','105.0\n42','102.0\n10.2',
             '87.0\n15','107.0\n32.1','110.2\n61.2','120.0\n22.4','88.0\n90','106.2\n16.2','101.1\n10.1']})

df1=df.pivot_table(values = 'col7', index = ['col4', 'col5', 'col6'], columns = ['col2'], aggfunc = 'max')
df2 = df1[((df1.groupby(level='col4').rank(ascending=False) == 1.).any(axis=1)) & (df1 >= 105).any(axis=1)]

print(df2)

我收到以下錯誤：

  File "pandas\_libs\ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '>=' not supported between instances of 'str' and 'int'

條件滿足后最終的pivot表output應該是這樣的：

col2                   test1          test2           test3         test4        test5
col4   col5    col6                                                
input1 result2 10    106.2\n16.2       NaN             NaN           NaN          NaN
input2 result2 20    101.2\n21      103.0\n20.6      87.0\n15      120.0\n22.4    NaN
input3 result3 30    102.3\n34      104.0\n31.2     107.0\n32.1      NaN          NaN
input4 result4 40    101.4\n41      105.0\n42           NaN          NaN          NaN
input5 result4 50       NaN            NaN          110.2\n61.2      NaN          NaN

非常感謝任何指導。 提前致謝。

Answer 1

您可以使用 Pandas applymap創建輔助 dataframe，它僅包含df1的第一行值，然后將其應用於過濾條件。

...
...
df1=df.pivot_table(values = 'col7', index = ['col4', 'col5', 'col6'], columns = ['col2'], aggfunc = 'max')

df_tmp = df1.applymap(lambda x: float(str(x).split('\n')[0]))

df2 = df1[
    ((df_tmp.groupby(level='col4').rank(ascending=False) == 1.).any(axis=1)) &
    (df_tmp >= 105).any(axis=1)
]

print(df2)

col2                       test1        test2        test3        test4 test5
col4   col5    col6
input1 result2 10    106.2\n16.2          NaN          NaN          NaN   NaN
input2 result2 20      101.2\n21  103.0\n20.6     87.0\n15  120.0\n22.4   NaN
input3 result3 30      102.3\n34  104.0\n31.2  107.0\n32.1          NaN   NaN
input4 result4 40      101.4\n41    105.0\n42          NaN          NaN   NaN
input5 result4 50            NaN          NaN  110.2\n61.2          NaN   NaN

Pandas Pivot 表：按條件過濾時出錯

問題描述

1 個解決方案

解決方案1
0 2021-10-10 13:59:56

Pandas Pivot 表：按條件過濾時出錯

問題描述

1 個解決方案

解決方案1 0 2021-10-10 13:59:56

解決方案1
0 2021-10-10 13:59:56