有效枚举 DataFrame 中每个组的 bin 中的行

Question

I am trying to find a way of flexibly changing the number of rows that I bin for each group in a pandas data frame.我正在尝试找到一种方法来灵活地更改我为 pandas 数据帧中的每个组设置的行数。

Each group ID has ~700 rows and I would like to add a column called bin_number so that starting at 0 it repeats 0 for the length of the bin I desire, and then the bin_number would be 1 and repeat n times and so on.每个组ID有大约 700 行，我想添加一个名为bin_number的列，以便从 0 开始重复 0 以达到我想要的 bin 长度，然后bin_number为 1 并重复 n 次，依此类推。

So, say I want bin_length of 10, I would have 70 bins and the bin number would span from 0-69 repeating 10 times starting over for each ID group.因此，假设我想要bin_length为 10，我将有 70 个 bin，并且 bin 编号将跨越 0-69，重复 10 次，从每个ID组重新开始。 The column would look something like the following:该列将如下所示：

0
0
0 (repeating bin_length number of times)
.
.
1
1
1

Plus would be if it could be flexible to different number of rows in each group.另外，如果它可以灵活地适应每组中不同的行数。

This is what I have been working with but it doesn't seem like the right approach.这是我一直在使用的方法，但它似乎不是正确的方法。

df.groupby("ID").apply(lambda x: np.arange(len(df)) // 10)

Any pointers appreciated!任何指针表示赞赏！ Thanks!谢谢！

Answer 1

Try groupby cumcout + // :尝试groupby cumcout + // ：

df['bins'] = df.groupby('ID').cumcount() // bin_len

Sample DF bin length of 2:样本 DF bin 长度为 2：

    ID  bins
0    1     0
1    1     0
2    1     1
3    1     1
4    1     2
5    1     2
6    1     3
7    1     3
8    2     0
9    2     0
10   2     1
11   2     1
12   2     2
13   2     2
14   2     3
15   2     3

Complete Working Example:完整的工作示例：

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'ID': np.repeat(np.arange(1, 3), 8)
})

bin_len = 2

df['bins'] = df.groupby('ID').cumcount() // bin_len

print(df)

有效枚举 DataFrame 中每个组的 bin 中的行

问题描述

1 个解决方案

解决方案1
0 2021-05-29 00:03:42

有效枚举 DataFrame 中每个组的 bin 中的行

问题描述

1 个解决方案

解决方案1 0 2021-05-29 00:03:42

解决方案1
0 2021-05-29 00:03:42