[英]pandas: enumerate items in each group
I have a DataFrame like 我有一个像
id chi prop ord
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 0
5 100 L 71 0
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 0
12 101 L 67 0
13 101 L 68 0
14 101 L 69 0
15 101 L 71 0
16 101 L 72 0
17 201 R 67 0
18 201 R 68 0
19 201 R 69 0
ord
essentially gives the ordering of the entries when ( prop
, chi
and id
) all have the same value. 当(
prop
, chi
和id
)都具有相同的值时, ord
本质上给出了条目的顺序。 This isn't quite what I'd like though. 这不是我想要的。 Instead, I'd like to be able to enumerate the entries of each group g in
{(id, chi)}
from 0 to n_g where n_g is the size of group g. 相反,我希望能够枚举
{(id, chi)}
中每个组g的项,从0到n_g,其中n_g是组g的大小。 So I'd like to obtain something that looks like 所以我想获得看起来像
id chi prop count
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 4
5 100 L 71 5
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 1
12 101 L 67 0
13 101 L 68 1
14 101 L 69 2
15 101 L 71 3
16 101 L 72 4
17 201 R 67 0
18 201 R 68 1
19 201 R 69 2
I'd like to know if there's a simple way of doing this with pandas
. 我想知道是否有一种简单的方法可以对付
pandas
。 The following comes very close, but it feels way too complicated, and it for some reason won't let me join
the resulting dataframe with the original one. 下面非常接近,但感觉太复杂,它由于某种原因不会让我
join
与原所产生的数据帧。
(df.groupby(['id', 'chi'])
.apply(lambda g: np.arange(g.shape[0]))
.apply(pd.Series, 1)
.stack()
.rename('counter')
.reset_index()
.drop(columns=['level_2']))
EDIT: A second way of course is the for
loop way, but I'm looking for something more "Pythonic" than: 编辑:当然,第二种方法是
for
循环方法,但是我正在寻找比“ Pythonic”更多的东西:
for gname, idx in df.groupby(['id','chi']).groups.items():
tmp = df.loc[idx]
df.loc[idx, 'counter'] = np.arange(tmp.shape[0])
R has a very simple way of achieving this behaviour using the tidyverse
packages, but I haven't quite found the well-oiled way to achieve the same thing with pandas
. R有使用
tidyverse
包实现此行为的非常简单的方法,但是我还没有找到使用pandas
实现相同目标的有效方法。 Any help provided is greatly appreciated! 提供的任何帮助将不胜感激!
cumcount
df.assign(ord=df.groupby(['id', 'chi']).cumcount())
id chi prop ord
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 4
5 100 L 71 5
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 1
12 101 L 67 0
13 101 L 68 1
14 101 L 69 2
15 101 L 71 3
16 101 L 72 4
17 201 R 67 0
18 201 R 68 1
19 201 R 69 2
defaultdict
and count
defaultdict
和count
from itertools import count
from collections import defaultdict
d = defaultdict(count)
df.assign(ord=[next(d[t]) for t in zip(df.id, df.chi)])
id chi prop ord
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 4
5 100 L 71 5
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 1
12 101 L 67 0
13 101 L 68 1
14 101 L 69 2
15 101 L 71 3
16 101 L 72 4
17 201 R 67 0
18 201 R 68 1
19 201 R 69 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.