简体   繁体   English

熊猫:列举每个组中的项目

[英]pandas: enumerate items in each group

I have a DataFrame like 我有一个像

    id   chi  prop   ord 
0   100   L    67     0 
1   100   L    68     1 
2   100   L    68     2 
3   100   L    68     3 
4   100   L    70     0 
5   100   L    71     0 
6   100   R    67     0 
7   100   R    68     1 
8   100   R    68     2 
9   100   R    68     3 
10  110   R    70     0 
11  110   R    71     0 
12  101   L    67     0 
13  101   L    68     0 
14  101   L    69     0 
15  101   L    71     0 
16  101   L    72     0 
17  201   R    67     0 
18  201   R    68     0 
19  201   R    69     0

ord essentially gives the ordering of the entries when ( prop , chi and id ) all have the same value. 当( propchiid )都具有相同的值时, ord本质上给出了条目的顺序。 This isn't quite what I'd like though. 这不是我想要的。 Instead, I'd like to be able to enumerate the entries of each group g in {(id, chi)} from 0 to n_g where n_g is the size of group g. 相反,我希望能够枚举{(id, chi)}中每个组g的项,从0到n_g,其中n_g是组g的大小。 So I'd like to obtain something that looks like 所以我想获得看起来像

    id   chi  prop   count 
0   100   L    67     0 
1   100   L    68     1 
2   100   L    68     2 
3   100   L    68     3 
4   100   L    70     4 
5   100   L    71     5 
6   100   R    67     0 
7   100   R    68     1 
8   100   R    68     2 
9   100   R    68     3 
10  110   R    70     0 
11  110   R    71     1 
12  101   L    67     0 
13  101   L    68     1 
14  101   L    69     2 
15  101   L    71     3 
16  101   L    72     4 
17  201   R    67     0 
18  201   R    68     1 
19  201   R    69     2

I'd like to know if there's a simple way of doing this with pandas . 我想知道是否有一种简单的方法可以对付pandas The following comes very close, but it feels way too complicated, and it for some reason won't let me join the resulting dataframe with the original one. 下面非常接近,但感觉太复杂,它由于某种原因不会让我join与原所产生的数据帧。

(df.groupby(['id', 'chi'])
   .apply(lambda g: np.arange(g.shape[0]))
   .apply(pd.Series, 1)
   .stack()
   .rename('counter')
   .reset_index()         
   .drop(columns=['level_2']))

EDIT: A second way of course is the for loop way, but I'm looking for something more "Pythonic" than: 编辑:当然,第二种方法是for循环方法,但是我正在寻找比“ Pythonic”更多的东西:

for gname, idx in df.groupby(['id','chi']).groups.items():
    tmp = df.loc[idx]
    df.loc[idx, 'counter'] = np.arange(tmp.shape[0])

R has a very simple way of achieving this behaviour using the tidyverse packages, but I haven't quite found the well-oiled way to achieve the same thing with pandas . R有使用tidyverse包实现此行为的非常简单的方法,但是我还没有找到使用pandas实现相同目标的有效方法。 Any help provided is greatly appreciated! 提供的任何帮助将不胜感激!

cumcount

df.assign(ord=df.groupby(['id', 'chi']).cumcount())

     id chi  prop  ord
0   100   L    67    0
1   100   L    68    1
2   100   L    68    2
3   100   L    68    3
4   100   L    70    4
5   100   L    71    5
6   100   R    67    0
7   100   R    68    1
8   100   R    68    2
9   100   R    68    3
10  110   R    70    0
11  110   R    71    1
12  101   L    67    0
13  101   L    68    1
14  101   L    69    2
15  101   L    71    3
16  101   L    72    4
17  201   R    67    0
18  201   R    68    1
19  201   R    69    2

defaultdict and count defaultdictcount

from itertools import count
from collections import defaultdict

d = defaultdict(count)

df.assign(ord=[next(d[t]) for t in zip(df.id, df.chi)])

     id chi  prop  ord
0   100   L    67    0
1   100   L    68    1
2   100   L    68    2
3   100   L    68    3
4   100   L    70    4
5   100   L    71    5
6   100   R    67    0
7   100   R    68    1
8   100   R    68    2
9   100   R    68    3
10  110   R    70    0
11  110   R    71    1
12  101   L    67    0
13  101   L    68    1
14  101   L    69    2
15  101   L    71    3
16  101   L    72    4
17  201   R    67    0
18  201   R    68    1
19  201   R    69    2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM