pandas function any() 不返回我想要的结果

Question

I have the following DataFrame我有以下DataFrame

df = pd.DataFrame(
    {
        'class': ['0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0'],
        'item':  ['1','1','2','2','2','3','3','3','3','3','4','4','5','5','5','5','5','5','5'],
        'last_PO_code': ['103','103','103','104','103','103','104','105','106','103','103','104','103','103','104','105','105','106','1046'],
        'qty': [3,4,3,3,2,4,4,3,3,3,5,5,2,6,8,2,6,2,6],

    }
)

I apply the following rules for each unique item in the item column to this DataFrame :我将item列中每个唯一项目的以下规则应用于此DataFrame ：

last_PO_code has '103' only. last_PO_code只有'103' 。
last_PO_code has ( '103' & '104' ) and ( qty column of '103' > qty column of '104' ) last_PO_code有 ( '103' & '104' ) 和 ( '103' qty列 > '104' qty列)
last_PO_code has ( '103' & '104' & '105' & '106' ) and ( qty column of '105' == qty column of '106' ) and ( qty column of '103' > qty column of '104' ) last_PO_code具有 ( '103' & '104' & '105' & '106' ) 和 ( '105' qty列 == '106' qty列) 和 ( '103' qty列 > '104' ' 的qty列'104' )
last_PO_code don't have '103' last_PO_code没有'103'
last_PO_code has ( '103' & '104' ) and ( qty column of '103' == qty column of '104' ) last_PO_code具有（ '103' & '104' ）和（ '103' qty列 == '104' qty列）
last_PO_code has ( '103' & '104' & '105' & '106' ) and ( qty column of '105' == qty column of '106' ) and ( qty column of '103' == qty column of '104' ) last_PO_code有 ( '103' & '104' & '105' & '106' ) 和 ( '105' qty列 == '106' qty列) 和 ( '103' qty列 == '104'的qty列'104' )

I wrote the following code, but the result is not what I want.我写了以下代码，但结果不是我想要的。


regle1 = lambda x: True if x['last_PO_code'].eq('103').all() else False
regle2 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and x['last_PO_code'].eq('103').sum() > x['last_PO_code'].eq('104').sum() \
    else False
regle3 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and x['last_PO_code'].eq('105').any() \
    and x['last_PO_code'].eq('106').any() \
    and x['last_PO_code'].eq('103').sum() > x['last_PO_code'].eq('104').sum() \
    and x['last_PO_code'].eq('105').sum() == x['last_PO_code'].eq('106').sum() \
    else False
regle4 = lambda x: False if x['last_PO_code'].eq('103').any() else True

regle5 = lambda x: True if (x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any()) \
    and x['last_PO_code'].eq('103').sum() == x['last_PO_code'].eq('104').sum() \
    else False
regle6 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and x['last_PO_code'].eq('105').any() \
    and x['last_PO_code'].eq('106').any() \
    and x['last_PO_code'].eq('103').sum() == x['last_PO_code'].eq('104').sum() \
    and x['last_PO_code'].eq('105').sum() == x['last_PO_code'].eq('106').sum() \
    else False

df2 = df.groupby(['class','item']).apply(lambda x: pd.Series({'regle1' : regle1(x),
                                  'regle2': regle2(x),
                                  'regle3' : regle3(x)
                                  }))

Only regle1 does what I want for all items.只有regle1对所有项目都做了我想要的。 For me the problem comes from the any() function.对我来说，问题来自any() function。 Either I use it badly or I don't understand it well.要么我用得不好，要么我不太了解它。

What I have:我有的：

           regle1   regle2  regle3  regle4  regle5  regle6
class   item                        
0       1   True    False   False   False   False   False
        2   False   True    False   False   False   False
        3   False   True    True    False   False   False
        4   False   False   False   False   True    False
        5   False   True    True    False   False   False

What I want:我想要的是：

           regle1   regle2  regle3  regle4  regle5  regle6
class   item                        
0       1   True    False   False   False   False   False
        2   False   True    False   False   False   False
        3   False   True    True    False   False   False
        4   False   False   False   False   True    False
        5   False   False   False   False   True    True

All the mistakes I noticed were on item 5, but I don't understand why我注意到的所有错误都在第 5 项上，但我不明白为什么

Answer 1

The problem is, that you are summing the number of 'last_PO_code' instead of ' qty '.问题是，您正在对'last_PO_code'而不是“ qty ”的数量求和。 In each lambda, you must have:在每个 lambda 中，您必须具有：

(x['last_PO_code'].eq('103')*x['qty']).sum()

or as mozway suggested, even better:或者正如mozway建议的那样，甚至更好：

x.loc[x['last_PO_code'].eq('103'), 'qty'].sum()

instead of:代替：

x['last_PO_code'].eq('103').sum()

The whole code:整个代码：

egle1 = lambda x: True if x['last_PO_code'].eq('103').all() else False
regle2 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and (x['last_PO_code'].eq('103') * x['qty']).sum()  > (x['last_PO_code'].eq('104') * x['qty']).sum()  \
    else False
regle3 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and x['last_PO_code'].eq('105').any() \
    and x['last_PO_code'].eq('106').any() \
    and (x['last_PO_code'].eq('103')*x['qty']).sum() > (x['last_PO_code'].eq('104')*x['qty']).sum() \
    and (x['last_PO_code'].eq('105')*x['qty']).sum() == (x['last_PO_code'].eq('106')*x['qty']).sum() \
    else False
regle4 = lambda x: False if x['last_PO_code'].eq('103').any() else True

regle5 = lambda x: True if (x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any()) \
    and (x['last_PO_code'].eq('103')*x['qty']).sum() == (x['last_PO_code'].eq('104')*x['qty']).sum() \
    else False
regle6 = lambda x: True if x['last_PO_code'].eq('103').any() \
    and x['last_PO_code'].eq('104').any() \
    and x['last_PO_code'].eq('105').any() \
    and x['last_PO_code'].eq('106').any() \
    and (x['last_PO_code'].eq('103')*x['qty']).sum() == (x['last_PO_code'].eq('104')*x['qty']).sum() \
    and (x['last_PO_code'].eq('105')*x['qty']).sum() == (x['last_PO_code'].eq('106')*x['qty']).sum() \
    else False

df2 = df.groupby(['class','item']).apply(lambda x: pd.Series({'regle1' : regle1(x),
                                  'regle2' : regle2(x),
                                  'regle3' : regle3(x),
                                  'regle4' : regle4(x),
                                  'regle5' : regle5(x),
                                  'regle6' : regle6(x),
                                  }))

#               regle1  regle2  regle3  regle4  regle5  regle6
#class  item                        
#0      1       True    False   False   False   False   False
#       2       False   True    False   False   False   False
#       3       False   True    True    False   False   False
#       4       False   False   False   False   True    False
#       5       False   False   False   False   True    True

PS. PS。 At this moment maybe it's time to use normal functions instead of lambdas, to have cleaner code:D.现在也许是时候使用普通函数而不是 lambdas 来获得更简洁的代码了：D。 You also have repeatable chunks of code in your lambda, which could be easily automated.您的 lambda 中还有可重复的代码块，可以轻松实现自动化。

PS2. PS2。 I assumed, that in your example data, you have a typo (there shuld be 106 instead of 1046我假设，在您的示例数据中，您有一个错字（应该是106而不是1046

pandas function any() 不返回我想要的结果

问题描述

1 个解决方案

解决方案1
2 2022-01-07 11:18:53

pandas function any() 不返回我想要的结果

问题描述

1 个解决方案

解决方案1 2 2022-01-07 11:18:53

解决方案1
2 2022-01-07 11:18:53