Python Pandas 计算具有多个条件和 groupby 的列

Question

我有以下数据框

df = pd.DataFrame([
        ['LEhOc7XSE0','2020', '03', 'car'],
        ['LEhOc7XSE0','2020', '03', 'truck'],
        ['LEhOc7XSE0','2020', '03', 'bike'],
        ['LEhOc7XSE0','2020', '03', 'insurance'],
        ['LEhOc7XSE0','2020', '03', 'inspection'],
        ['iXC5AfJMox','2020', '04', 'car'],
        ['iXC5AfJMox','2020', '04', 'truck'],
        ['iXC5AfJMox','2020', '04', 'inspection'],
        ['XpLLAySojz','2020', '01', 'bike'],
    ], columns=['order_id','year', 'month', 'item_type'])

列order_id不是唯一的，它在每一行中描述使用此order_id购买的商品。

现在我想任何计算的订单量（唯一order_id是一个数量级），如果订单中包含无论是car或bike ，但只有在秩序不是由这些项目的达独家。

df = pd.DataFrame([
        ['2020','03', '1'],
        ['2020','04', '1'],
    ], columns=['year', 'month', 'count_orders_with_condition'])

这就是结果应该是什么样子，fe order_id = XpLLAySojz包含一个bike ，但被省略了，因为它只包含两个。 我正在操作的数据帧相当大，这就是为什么使用ìterrow()函数在这里表现非常糟糕。 我对大熊猫提供的解决这个问题的可能性有点迷茫。

Answer 1

尝试：

import numpy as np

df['mask'] = np.where(df['item_type'].isin(['bike', 'car']), 1, 0)
mask = df.groupby('order_id')['mask'].nunique()
mask = mask.loc[mask.eq(2)]

res = df.set_index('order_id').loc[mask.index].reset_index().groupby(['year', 'month'])['order_id'].nunique()

输出：

>>> res

year  month
2020  03       1
      04       1
Name: order_id, dtype: int64

Python Pandas 计算具有多个条件和 groupby 的列

问题描述

1 个解决方案

解决方案1
0 2020-10-07 18:05:52

Python Pandas 计算具有多个条件和 groupby 的列

问题描述

1 个解决方案

解决方案1 0 2020-10-07 18:05:52

解决方案1
0 2020-10-07 18:05:52