简体   繁体   English

有效枚举 DataFrame 中每个组的 bin 中的行

[英]Efficiently enumerate rows in bins for each group in DataFrame

I am trying to find a way of flexibly changing the number of rows that I bin for each group in a pandas data frame.我正在尝试找到一种方法来灵活地更改我为 pandas 数据帧中的每个组设置的行数。

Each group ID has ~700 rows and I would like to add a column called bin_number so that starting at 0 it repeats 0 for the length of the bin I desire, and then the bin_number would be 1 and repeat n times and so on.每个组ID有大约 700 行,我想添加一个名为bin_number的列,以便从 0 开始重复 0 以达到我想要的 bin 长度,然后bin_number为 1 并重复 n 次,依此类推。

So, say I want bin_length of 10, I would have 70 bins and the bin number would span from 0-69 repeating 10 times starting over for each ID group.因此,假设我想要bin_length为 10,我将有 70 个 bin,并且 bin 编号将跨越 0-69,重复 10 次,从每个ID组重新开始。 The column would look something like the following:该列将如下所示:

0
0
0 (repeating bin_length number of times)
.
.
1
1
1 

Plus would be if it could be flexible to different number of rows in each group.另外,如果它可以灵活地适应每组中不同的行数。

This is what I have been working with but it doesn't seem like the right approach.这是我一直在使用的方法,但它似乎不是正确的方法。

df.groupby("ID").apply(lambda x: np.arange(len(df)) // 10)

Any pointers appreciated!任何指针表示赞赏! Thanks!谢谢!

Try groupby cumcout + // :尝试groupby cumcout + //

df['bins'] = df.groupby('ID').cumcount() // bin_len

Sample DF bin length of 2:样本 DF bin 长度为 2:

    ID  bins
0    1     0
1    1     0
2    1     1
3    1     1
4    1     2
5    1     2
6    1     3
7    1     3
8    2     0
9    2     0
10   2     1
11   2     1
12   2     2
13   2     2
14   2     3
15   2     3

Complete Working Example:完整的工作示例:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'ID': np.repeat(np.arange(1, 3), 8)
})

bin_len = 2

df['bins'] = df.groupby('ID').cumcount() // bin_len

print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM