简体   繁体   中英

Python Pandas: Is There a Faster Way to Split and Recombine a DataFrame based on criteria?

I want to group this DataFrame based on a particular column "ContactID", but if the group's column "PaymentType" doesn't include a particular value, then I want to remove the entire group from the DataFrame.

I have something like this:

UniqueID = data.drop_duplicates('ContactID')['ContactID'].tolist()
OnlyRefinance=[]
for i in UniqueID:
    splits = data[data['ContactID']==i].reset_index(drop=True)
    if any(splits['PaymentType']==160):
        OnlyRefinance.append(splits)
OnlyRefinance = pd.concat(OnlyRefinance)

This works but it's VERY slow and I was wondering if there was a faster way to accomplish this.

Another option you can use groupby.filter :

data.groupby("ContactID").filter(lambda g: (g.PaymentType == 160).any())

This will only keep groups whose PaymentType contains 160.

You can do this easier by doing:

to_drop = data.loc[data['PaymentType'] == 160, 'ContactID'].unique()
data[~data['ContactID'].isin(to_drop)]

So first filter out all rows where the condition isn't met and get the unique contact ids we want to drop

then pass these to isin and invert the mask using ~ this will remove all rows where the ContactID is in this array

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM