简体   繁体   中英

Add index in pandas based on each occurance of another column specific value

I have a dataframe like so:

category name   age 
parent  harry   29
child   smith   12
parent  sally   41
child   david   19
child   mike    16

And I want to add a column to group families based on each occurence of category column value 'parent' (the dataframe is in order). As in:

category name   age  family_id
parent  harry   29     0
child   smith   12     0
parent  sally   41     1
child   david   19     1
child   mike    16     1

I am trying to make the family_id be an incrementing integer.

I've tried a bunch of group_by and am currently trying to write my own apply function but its very slow and not working as expected. I haven't been able to find an example that groups rows based on a column value over every occurence of the same value .

You can use eq to match if category column equals parent and cumsum , sub is to subtract 1 since cumsum starts from 1 here:

df['family_id'] = df['category'].eq('parent').cumsum().sub(1)
print(df)

  category   name  age  family_id
0   parent  harry   29          0
1    child  smith   12          0
2   parent  sally   41          1
3    child  david   19          1
4    child   mike   16          1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM