I have a dataframe that has the following columns. I want to group the data by the order number and then drop all the groups that do not contain one specific item.
order_id | product_id | purchase_date |
---|---|---|
1234 | 23546.0. | 2020-01-10. |
1234. | 32423.0 | 2020-01-10. |
5678. | 43244.0. | 2020-02-10. |
when I use the line below if doesn't drop order_id 5678
df6 = df2.groupby(by='order_id').filter(lambda df2: df2['product_id'] == 23546.0)
I get the error: 'DataFrame' object is not callable
Use:
df.loc[df['product_id'].eq('23546.0.').groupby(df['order_id']).transform('any')]
order_id product_id purchase_date
0 1234.0 23546.0. 2020-01-10.
1 1234.0 32423.0 2020-01-10.
if product_id is float
df.loc[df['product_id'].eq(23546.0).groupby(df['order_id']).transform('any')]
Another solution:
df_out = df.groupby(by="order_id").filter(lambda x: 23546.0 in x["product_id"].values)
print(df_out)
Prints:
order_id product_id purchase_date
0 1234.0 23546.0 2020-01-10
1 1234.0 32423.0 2020-01-10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.