Add index in pandas based on each occurance of another column specific value

Question

I have a dataframe like so:

category name   age 
parent  harry   29
child   smith   12
parent  sally   41
child   david   19
child   mike    16

And I want to add a column to group families based on each occurence of category column value 'parent' (the dataframe is in order). As in:

category name   age  family_id
parent  harry   29     0
child   smith   12     0
parent  sally   41     1
child   david   19     1
child   mike    16     1

I am trying to make the family_id be an incrementing integer.

I've tried a bunch of group_by and am currently trying to write my own apply function but its very slow and not working as expected. I haven't been able to find an example that groups rows based on a column value over every occurence of the same value .

Answer 1

You can use eq to match if category column equals parent and cumsum , sub is to subtract 1 since cumsum starts from 1 here:

df['family_id'] = df['category'].eq('parent').cumsum().sub(1)
print(df)

  category   name  age  family_id
0   parent  harry   29          0
1    child  smith   12          0
2   parent  sally   41          1
3    child  david   19          1
4    child   mike   16          1

Add index in pandas based on each occurance of another column specific value

Question

1 answers

solution1
1 ACCPTED 2020-01-17 15:03:58

Add index in pandas based on each occurance of another column specific value

Question

1 answers

solution1 1 ACCPTED 2020-01-17 15:03:58

solution1
1 ACCPTED 2020-01-17 15:03:58