New secondary index based on column value (pandas dataframe)

Question

I have a Pandas DataFrame:

index1	col1	col2
0	12719	row1
1	12719	row2
2	12719	row3
3	12719	row4
4	20000	row1
5	20000	row2
6	20000	row3
7	20000	row4
8	20000	row5

And I wanted to have a new column index2 based on the occurrence (running number) of col1 value:

index1	index2	col1	col2
0	0	12719	row1
1	1	12719	row2
2	2	12719	row3
3	3	12719	row4
4	0	20000	row1
5	1	20000	row2
6	2	20000	row3
7	3	20000	row4
8	4	20000	row5

I have tried a different combination of regex but just can't fit in my case.

Answer 1

You can use GroupBy.cumcount() to generate the values of second index and set it as the second index by .set_index() with parameter append=True .

df['index2'] = df.groupby('col1').cumcount()
df = df.set_index('index2', append=True)

Result:

print(df)

           col1  col2
  index2             
0 0       12719  row1
1 1       12719  row2
2 2       12719  row3
3 3       12719  row4
4 0       20000  row1
5 1       20000  row2
6 2       20000  row3
7 3       20000  row4
8 4       20000  row5

Answer 2

You can assign the index

df.index = pd.MultiIndex.from_arrays([df.index,df.groupby('col1').cumcount()])
df
Out[77]: 
           col1  col2
index1               
0      0  12719  row1
1      1  12719  row2
2      2  12719  row3
3      3  12719  row4
4      0  20000  row1
5      1  20000  row2
6      2  20000  row3
7      3  20000  row4
8      4  20000  row5

New secondary index based on column value (pandas dataframe)

Question

2 answers

solution1
1 ACCPTED 2021-06-28 16:33:28

solution2
1 2021-06-28 16:37:38

New secondary index based on column value (pandas dataframe)

Question

2 answers

solution1 1 ACCPTED 2021-06-28 16:33:28

solution2 1 2021-06-28 16:37:38

solution1
1 ACCPTED 2021-06-28 16:33:28

solution2
1 2021-06-28 16:37:38