简体   繁体   中英

New secondary index based on column value (pandas dataframe)

I have a Pandas DataFrame:

index1 col1 col2
0 12719 row1
1 12719 row2
2 12719 row3
3 12719 row4
4 20000 row1
5 20000 row2
6 20000 row3
7 20000 row4
8 20000 row5

And I wanted to have a new column index2 based on the occurrence (running number) of col1 value:

index1 index2 col1 col2
0 0 12719 row1
1 1 12719 row2
2 2 12719 row3
3 3 12719 row4
4 0 20000 row1
5 1 20000 row2
6 2 20000 row3
7 3 20000 row4
8 4 20000 row5

I have tried a different combination of regex but just can't fit in my case.

You can use GroupBy.cumcount() to generate the values of second index and set it as the second index by .set_index() with parameter append=True .

df['index2'] = df.groupby('col1').cumcount()
df = df.set_index('index2', append=True)

Result:

print(df)

           col1  col2
  index2             
0 0       12719  row1
1 1       12719  row2
2 2       12719  row3
3 3       12719  row4
4 0       20000  row1
5 1       20000  row2
6 2       20000  row3
7 3       20000  row4
8 4       20000  row5

You can assign the index

df.index = pd.MultiIndex.from_arrays([df.index,df.groupby('col1').cumcount()])
df
Out[77]: 
           col1  col2
index1               
0      0  12719  row1
1      1  12719  row2
2      2  12719  row3
3      3  12719  row4
4      0  20000  row1
5      1  20000  row2
6      2  20000  row3
7      3  20000  row4
8      4  20000  row5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM