I has a huge pandas data frame that looks like this:
id type price min max
1 ch 10 10 100
1 fo 8 20 100
1 dr 7 10 90
1 ad 5 16 20
1 dr 6 10 90
1 fo 4 20 100
2 ch 5 40 50
2 fo 3 10 50
2 ch 3 40 50
... ... ... ... ...
I would like to add a new column 'match' to get something such this:
id type price min max match
1 ch 10 10 100 false
1 fo 8 20 100 false
1 dr 7 10 90 false
1 ad 5 16 20 false
1 dr 6 10 90 true
1 fo 4 20 100 true
2 ch 5 40 50 false
2 fo 3 10 50 false
2 ch 3 40 50 true
... ... ... ... ... ...
I tried using shift:
df['match']=np.where((df['id'] == df['id'].shift()) & (df['type'] == df['type'].shift()) & (df['min'] == df['min'].shift()) & (df['max'] == df['max'].shift()),true, false)
but that just compares the current row with the previous one.There is no specific pattern to determine the number of previous rows that match the condition. I would like to choose the id as a window to compare rows.Is there a way to do that?
Any suggestions are highly appreciated.
Thank you
You could use duplicated
specifying the subset
of columns to consider:
df.assign(match=df.duplicated(subset=['id', 'type', 'min', 'max']))
id type price min max match
0 1 ch 10 10 100 False
1 1 fo 8 20 100 False
2 1 dr 7 10 90 False
3 1 ad 5 16 20 False
4 1 dr 6 10 90 True
5 1 fo 4 20 100 True
6 2 ch 5 40 50 False
7 2 fo 3 10 50 False
8 2 ch 3 40 50 True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.