简体   繁体   中英

How to compare subset of rows in pandas dataframe

I has a huge pandas data frame that looks like this:

id        type        price         min           max

1          ch           10          10            100
1          fo           8           20            100
1          dr           7           10            90
1          ad           5           16            20
1          dr           6           10            90
1          fo           4           20            100
2          ch           5           40            50
2          fo           3           10            50
2          ch           3           40            50
...        ...          ...         ...           ... 

I would like to add a new column 'match' to get something such this:

id         type         price       min           max     match

1          ch           10          10            100     false
1          fo           8           20            100     false
1          dr           7           10            90      false
1          ad           5           16            20      false
1          dr           6           10            90      true
1          fo           4           20            100     true
2          ch           5           40            50      false
2          fo           3           10            50      false
2          ch           3           40            50      true
...        ...          ...         ...           ...     ...

I tried using shift:

 df['match']=np.where((df['id'] == df['id'].shift()) & (df['type'] == df['type'].shift()) & (df['min'] == df['min'].shift()) & (df['max'] == df['max'].shift()),true, false)

but that just compares the current row with the previous one.There is no specific pattern to determine the number of previous rows that match the condition. I would like to choose the id as a window to compare rows.Is there a way to do that?

Any suggestions are highly appreciated.

Thank you

You could use duplicated specifying the subset of columns to consider:

df.assign(match=df.duplicated(subset=['id', 'type', 'min', 'max']))

   id type  price  min  max  match
0   1   ch     10   10  100  False
1   1   fo      8   20  100  False
2   1   dr      7   10   90  False
3   1   ad      5   16   20  False
4   1   dr      6   10   90   True
5   1   fo      4   20  100   True
6   2   ch      5   40   50  False
7   2   fo      3   10   50  False
8   2   ch      3   40   50   True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM