简体   繁体   English

在多个条件下过滤数据帧

[英]filtering dataframe on multiple conditions

data = {'year': ['11:23:19', '11:23:19', '11:24:19', '11:25:19', '11:25:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19'],
                'store_number': ['1944', '1945', '1946', '1948', '1948', '1949', '1947', '1948', '1949', '1947'],
                'retailer_name': ['Walmart', 'Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
                'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
                'id': [10, 10, 11, 11, 11, 10, 10, 11, 11, 10]}

        stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'amount', 'id'])
        stores.set_index(['retailer_name', 'store_number', 'year'], inplace=True)
        stores_grouped = stores.groupby(level=[0, 1, 2])

That looks like: 看起来像:

                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
CRV           1946         11:24:19       8  11
              1948         11:25:19       6  11
                           11:25:19       1  11
Walmart       1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1948         11:23:19       6  11
              1949         11:23:19      12  11
              1947         11:23:19      11  10

I manage to filter on: stores_grouped.filter(lambda x: (len(x) == 1)) 我设法过滤: stores_grouped.filter(lambda x: (len(x) == 1))

But when I want to filter on two conditions: 但是当我想要在两个条件下过滤时:

That my group has length one and id column is equals 10. Any idea ho do so ? 我的组长度为1,id列等于10.任何想法都这样做吗?

Actually as filter expects a scalar bool you can just add the condition in the lambda like a normal if style statement: 实际上,当filter需要一个标量bool你可以像在普通的if语句中一样在lambda添加条件:

In [180]:
stores_grouped.filter(lambda x: (len(x) == 1 and x['id'] == 10))
​
Out[180]:
                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

You can use: 您可以使用:

print (stores_grouped.filter(lambda x: (len(x) == 1) & (x.id == 10).all()))
                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

i'd do it this way: 我这样做:

In [348]: stores_grouped.filter(lambda x: (len(x) == 1)).query('id == 10')
Out[348]:
                                     amount  id
retailer_name store_number year
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

Thinking outside the box, use drop_duplicates with keep=False : 在框外思考,使用drop_duplicates with keep=False

df.drop_duplicates(subset=['retailer_name', 'store_number', 'year'], keep=False) \
    .query('id == 10')

在此输入图像描述


Timing 定时

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM