Python Pandas: Is There a Faster Way to Split and Recombine a DataFrame based on criteria?

Question

I want to group this DataFrame based on a particular column "ContactID", but if the group's column "PaymentType" doesn't include a particular value, then I want to remove the entire group from the DataFrame.

I have something like this:

UniqueID = data.drop_duplicates('ContactID')['ContactID'].tolist()
OnlyRefinance=[]
for i in UniqueID:
    splits = data[data['ContactID']==i].reset_index(drop=True)
    if any(splits['PaymentType']==160):
        OnlyRefinance.append(splits)
OnlyRefinance = pd.concat(OnlyRefinance)

This works but it's VERY slow and I was wondering if there was a faster way to accomplish this.

Answer 1

Another option you can use groupby.filter :

data.groupby("ContactID").filter(lambda g: (g.PaymentType == 160).any())

This will only keep groups whose PaymentType contains 160.

Answer 2

You can do this easier by doing:

to_drop = data.loc[data['PaymentType'] == 160, 'ContactID'].unique()
data[~data['ContactID'].isin(to_drop)]

So first filter out all rows where the condition isn't met and get the unique contact ids we want to drop

then pass these to isin and invert the mask using ~ this will remove all rows where the ContactID is in this array

Python Pandas: Is There a Faster Way to Split and Recombine a DataFrame based on criteria?

Question

2 answers

solution1
6 ACCPTED 2017-02-07 16:59:19

solution2
3 2017-02-07 16:57:35

Python Pandas: Is There a Faster Way to Split and Recombine a DataFrame based on criteria?

Question

2 answers

solution1 6 ACCPTED 2017-02-07 16:59:19

solution2 3 2017-02-07 16:57:35

solution1
6 ACCPTED 2017-02-07 16:59:19

solution2
3 2017-02-07 16:57:35