[英]Efficiently enumerate rows in bins for each group in DataFrame
I am trying to find a way of flexibly changing the number of rows that I bin for each group in a pandas data frame.我正在尝试找到一种方法来灵活地更改我为 pandas 数据帧中的每个组设置的行数。
Each group ID
has ~700 rows and I would like to add a column called bin_number
so that starting at 0 it repeats 0 for the length of the bin I desire, and then the bin_number
would be 1 and repeat n times and so on.每个组
ID
有大约 700 行,我想添加一个名为bin_number
的列,以便从 0 开始重复 0 以达到我想要的 bin 长度,然后bin_number
为 1 并重复 n 次,依此类推。
So, say I want bin_length
of 10, I would have 70 bins and the bin number would span from 0-69 repeating 10 times starting over for each ID
group.因此,假设我想要
bin_length
为 10,我将有 70 个 bin,并且 bin 编号将跨越 0-69,重复 10 次,从每个ID
组重新开始。 The column would look something like the following:该列将如下所示:
0
0
0 (repeating bin_length number of times)
.
.
1
1
1
Plus would be if it could be flexible to different number of rows in each group.另外,如果它可以灵活地适应每组中不同的行数。
This is what I have been working with but it doesn't seem like the right approach.这是我一直在使用的方法,但它似乎不是正确的方法。
df.groupby("ID").apply(lambda x: np.arange(len(df)) // 10)
Any pointers appreciated!任何指针表示赞赏! Thanks!
谢谢!
Try groupby cumcout
+ //
:尝试
groupby cumcout
+ //
:
df['bins'] = df.groupby('ID').cumcount() // bin_len
Sample DF bin length of 2:样本 DF bin 长度为 2:
ID bins
0 1 0
1 1 0
2 1 1
3 1 1
4 1 2
5 1 2
6 1 3
7 1 3
8 2 0
9 2 0
10 2 1
11 2 1
12 2 2
13 2 2
14 2 3
15 2 3
Complete Working Example:完整的工作示例:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'ID': np.repeat(np.arange(1, 3), 8)
})
bin_len = 2
df['bins'] = df.groupby('ID').cumcount() // bin_len
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.