Pandas Pivot 表：按条件过滤时出错

Question

I have a dataframe which I pivoted and trying to create a updated dataframe when the values meet certain condition.我有一个 dataframe，当值满足特定条件时，我尝试创建一个更新的 dataframe。 The problem, I have is the values in the columns are structured in two lines.我遇到的问题是列中的值分为两行。 The comparison needs to be done on line1 of the value.需要在值的第 1 行进行比较。 For example, if the col7 value is '100.2\n11', then I need to compare 100.2 against the condition and if it satisfies the condition, then the final dataframe should contain the full value('100.2\n11') of the data and not just 100.2.例如，如果 col7 值为 '100.2\n11'，那么我需要将 100.2 与条件进行比较，如果它满足条件，那么最终的 dataframe 应该包含数据的完整值（'100.2\n11'）并且不只是 100.2。

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
    'col2': ['test1', 'test1', 'test1', 'test1', 'test2', 'test2', 'test2',
             'test2', 'test3', 'test3', 'test3', 'test3', 'test4', 'test5',
             'test1', 'test1'],
    'col3': ['t1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1',
             't1', 't1', 't1', 't1', 't1'],
    'col4': ['input1', 'input2', 'input3', 'input4', 'input1', 'input2',
             'input3', 'input4', 'input1', 'input2', 'input3', 'input5',
             'input2', 'input6', 'input1', 'input1'],
    'col5': ['result1', 'result2', 'result3', 'result4', 'result1', 'result2',
             'result3', 'result4', 'result1', 'result2', 'result3', 'result4',
             'result2', 'result1', 'result2', 'result6'],
    'col6': [10, 20, 30, 40, 10, 20, 30, 40, 10, 20, 30, 50, 20, 100, 10, 10],
    'col7': ['100.2\n11','101.2\n21','102.3\n34','101.4\n41','100.0\n10','103.0\n20.6','104.0\n31.2','105.0\n42','102.0\n10.2',
             '87.0\n15','107.0\n32.1','110.2\n61.2','120.0\n22.4','88.0\n90','106.2\n16.2','101.1\n10.1']})

df1=df.pivot_table(values = 'col7', index = ['col4', 'col5', 'col6'], columns = ['col2'], aggfunc = 'max')
df2 = df1[((df1.groupby(level='col4').rank(ascending=False) == 1.).any(axis=1)) & (df1 >= 105).any(axis=1)]

print(df2)

I am getting the following error:我收到以下错误：

  File "pandas\_libs\ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '>=' not supported between instances of 'str' and 'int'

The final pivot table output after the condition is satisfied should be as follows:条件满足后最终的pivot表output应该是这样的：

col2                   test1          test2           test3         test4        test5
col4   col5    col6                                                
input1 result2 10    106.2\n16.2       NaN             NaN           NaN          NaN
input2 result2 20    101.2\n21      103.0\n20.6      87.0\n15      120.0\n22.4    NaN
input3 result3 30    102.3\n34      104.0\n31.2     107.0\n32.1      NaN          NaN
input4 result4 40    101.4\n41      105.0\n42           NaN          NaN          NaN
input5 result4 50       NaN            NaN          110.2\n61.2      NaN          NaN

Any guidance is much appreciated.非常感谢任何指导。 Thanks in advance.提前致谢。

Answer 1

You could use Pandas applymap to create an auxiliary dataframe that contains only the first line values from df1 and then apply it to the filter conditions.您可以使用 Pandas applymap创建辅助 dataframe，它仅包含df1的第一行值，然后将其应用于过滤条件。

...
...
df1=df.pivot_table(values = 'col7', index = ['col4', 'col5', 'col6'], columns = ['col2'], aggfunc = 'max')

df_tmp = df1.applymap(lambda x: float(str(x).split('\n')[0]))

df2 = df1[
    ((df_tmp.groupby(level='col4').rank(ascending=False) == 1.).any(axis=1)) &
    (df_tmp >= 105).any(axis=1)
]

print(df2)

col2                       test1        test2        test3        test4 test5
col4   col5    col6
input1 result2 10    106.2\n16.2          NaN          NaN          NaN   NaN
input2 result2 20      101.2\n21  103.0\n20.6     87.0\n15  120.0\n22.4   NaN
input3 result3 30      102.3\n34  104.0\n31.2  107.0\n32.1          NaN   NaN
input4 result4 40      101.4\n41    105.0\n42          NaN          NaN   NaN
input5 result4 50            NaN          NaN  110.2\n61.2          NaN   NaN

Pandas Pivot 表：按条件过滤时出错

问题描述

1 个解决方案

解决方案1
0 2021-10-10 13:59:56

Pandas Pivot 表：按条件过滤时出错

问题描述

1 个解决方案

解决方案1 0 2021-10-10 13:59:56

解决方案1
0 2021-10-10 13:59:56