I have a Pandas DataFrame:
index1 | col1 | col2 |
---|---|---|
0 | 12719 | row1 |
1 | 12719 | row2 |
2 | 12719 | row3 |
3 | 12719 | row4 |
4 | 20000 | row1 |
5 | 20000 | row2 |
6 | 20000 | row3 |
7 | 20000 | row4 |
8 | 20000 | row5 |
And I wanted to have a new column index2
based on the occurrence (running number) of col1
value:
index1 | index2 | col1 | col2 |
---|---|---|---|
0 | 0 | 12719 | row1 |
1 | 1 | 12719 | row2 |
2 | 2 | 12719 | row3 |
3 | 3 | 12719 | row4 |
4 | 0 | 20000 | row1 |
5 | 1 | 20000 | row2 |
6 | 2 | 20000 | row3 |
7 | 3 | 20000 | row4 |
8 | 4 | 20000 | row5 |
I have tried a different combination of regex but just can't fit in my case.
You can use GroupBy.cumcount()
to generate the values of second index and set it as the second index by .set_index()
with parameter append=True
.
df['index2'] = df.groupby('col1').cumcount()
df = df.set_index('index2', append=True)
Result:
print(df)
col1 col2
index2
0 0 12719 row1
1 1 12719 row2
2 2 12719 row3
3 3 12719 row4
4 0 20000 row1
5 1 20000 row2
6 2 20000 row3
7 3 20000 row4
8 4 20000 row5
You can assign the index
df.index = pd.MultiIndex.from_arrays([df.index,df.groupby('col1').cumcount()])
df
Out[77]:
col1 col2
index1
0 0 12719 row1
1 1 12719 row2
2 2 12719 row3
3 3 12719 row4
4 0 20000 row1
5 1 20000 row2
6 2 20000 row3
7 3 20000 row4
8 4 20000 row5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.