I want to group this DataFrame based on a particular column "ContactID", but if the group's column "PaymentType" doesn't include a particular value, then I want to remove the entire group from the DataFrame.
I have something like this:
UniqueID = data.drop_duplicates('ContactID')['ContactID'].tolist()
OnlyRefinance=[]
for i in UniqueID:
splits = data[data['ContactID']==i].reset_index(drop=True)
if any(splits['PaymentType']==160):
OnlyRefinance.append(splits)
OnlyRefinance = pd.concat(OnlyRefinance)
This works but it's VERY slow and I was wondering if there was a faster way to accomplish this.
Another option you can use groupby.filter
:
data.groupby("ContactID").filter(lambda g: (g.PaymentType == 160).any())
This will only keep groups whose PaymentType contains 160.
You can do this easier by doing:
to_drop = data.loc[data['PaymentType'] == 160, 'ContactID'].unique()
data[~data['ContactID'].isin(to_drop)]
So first filter out all rows where the condition isn't met and get the unique contact ids we want to drop
then pass these to isin
and invert the mask using ~
this will remove all rows where the ContactID is in this array
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.