简体   繁体   中英

Filter group by result in DataFrame

I have DataFrame named 'concated'. It has columns: 'amount' - with a sums of transactions, 'mcccode_trtype' with a merchant type. I need to count only negative amounts of transactions by merchant code and count the mean of these transactions. And I need to filter merchants with more than 10 transactions.

So, I wrote this code:

res=concated[concated.amount<0].groupby('mcccode_trtype')['amount'].agg(['count', 'mean'])

The problem: How can I filter this by 'count' column in one line? I guess I should use 'filter' + lambda function, but finally failed with syntax.

Please, help.

Sample data:

from numpy import nan
concated=pd.DataFrame(
{0: {'amount': -2245.92,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '0 10:23:26',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 1: {'amount': -5614.79,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '6 07:08:31',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 2: {'amount': -1122.96,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '8 07:06:10',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 3: {'amount': -2245.92,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '11 08:49:03',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 4: {'amount': -2245.92,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '11 14:12:08',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 5: {'amount': -2245.92,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '11 14:15:30',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 6: {'amount': -2245.92,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '13 11:17:34',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 7: {'amount': -2245.92,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '18 07:39:05',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 8: {'amount': -449.18,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '19 13:24:35',
  'tr_description': 'DESCR2',
  'tr_type': 1030},
 9: {'amount': -1122.96,
  'customer_id': 39026145,
  'gender': 1.0,
  'mcc_code': 4814,
  'mcc_description': 'DESCR1',
  'mcccode_trtype': '48141030',
  'term_id': nan,
  'tr_datetime': '19 13:25:31',
  'tr_description': 'DESCR2',
  'tr_type': 1030}}).transpose()
concated

You can use .loc in combination with a lambda function as follows:

res = concated[concated.amount<0].groupby('mcccode_trtype')['amount'].agg(['count', 'mean']).loc[lambda x: x["count"] > 10]

Use DataFrame.query for filter by column count :

#if necessary
concated.amount = concated.amount.astype(float)

res=concated[concated.amount<0].groupby('mcccode_trtype')['amount'].agg(['count', 'mean']).query('count == 10')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM