I have a data frame like so:
|transaction_id|category|
-------------------------
|1234 |Book |
|1234 |Car |
|1234 |TV |
|1235 |Car |
|1235 |TV |
|1236 |Car |
And basically, I want to group by transaction_id and create a column that flags whether or not a transaction_id had a corresponding TV in the category column, so ideally the resulting data frame would look like this:
|transaction_id|HasTV?|
-----------------------
|1234 |Y |
|1235 |Y |
|1236 |N |
I'm using pandas and I know how to use the groupby function, I've just never had to do something like this where there's a conditional check before
One option is to look at .unique()
for the categories and then operate on the resulting Series:
In [28]: df.groupby("transaction_id")['category'].unique().apply(lambda x: 'TV' in x)
Out[28]:
transaction_id
1234.0 True
1235.0 True
1236.0 False
Name: category, dtype: bool
Another possibly faster but more obfuscated version would be to test for the desired category up front and then do the groupby:
In [29]: (df['category'] == 'TV').groupby(df["transaction_id"]).max()
Out[29]:
transaction_id
1234.0 True
1235.0 True
1236.0 False
Name: category, dtype: bool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.