使用具有重复值的列作为 Pandas 中的数据框索引

Question

我想使用具有重复值的列为数据框设置索引。 Pandas有什么办法可以自动添加第二个索引，这样当第一个索引重复时，第二个索引就会增加？

例如：

   ID              name  company           position
   ------------------------------------------------
0  23      Alex Monoson   Coobit      Sales manager
1  12    Johnny Johnson   Coobit  Marketing manager
2  62         Hans Dupa    Pesik  Marketing manager
3  31    Jessica Heiler  Montino           Engineer
4  92  Dominic Alvorine  Montino                CFO
5  16           Hei Lee   Coobit                CEO

我想使用company作为索引，并且会有另一个 integer 索引列

我预期的 output：

                    ID    name    position
company
------------------------------------------
Coobit      0       blah  blah        blah
Coobit      1       blah  blah        blah
Coobit      2       blah  blah        blah
Pesik       0       blah  blah        blah
Montino     0       blah  blah        blah
Montino     1       blah  blah        blah

Answer 1

我们可以使用cumcount

df['index2']=df.groupby('company').cumcount()
df=df.set_index(['company','index2']).sort_index()

使用具有重复值的列作为 Pandas 中的数据框索引

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-08 03:39:57

使用具有重复值的列作为 Pandas 中的数据框索引

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-08 03:39:57

解决方案1
1 已采纳 2019-11-08 03:39:57