Pandas Dataframe从分组中选择随机行，并找到每个分组的平均值

Question

I have a dataframe df that looks like this: 我有一个看起来像这样的数据框df：

             ID1       ID2         Bool           Count
0       12868123  387DB71C            0               1
1       12868123  84C0E502            1              11
2       12868123  387DB71C            1               1
8       12868123  80A9DCFC            0              16
9       12868123  7A260136            1              20
10      12868123  80A9DCFC            0              16
11      12868123  80BB4591            0              36
327295  8617B7D9  76A08B0E            0              19
327296  8617B7D9  76A08B0E            0              19
327297  8617B7D9  76D0DA26            1               2
327298  8617B7D9  7C92B2A6            1               3
327299  8617B7D9  75883296            1               1
327300  8617B7D9  78711A4F            0              12
327301  8617B7D9  78711A4F            0              12
327302  8617B7D9  78711A4F            0              12

I want to do two things: 我想做两件事：

1- I want to "randomly" extract n unique rows for each (ID1, Bool) instance. 1-我想为每个(ID1, Bool)实例“随机”提取n唯一的行。 So if n = 2 , one possible result could be: 因此，如果n = 2 ，则一种可能的结果可能是：

             ID1       ID2         Bool           Count
0       12868123  387DB71C            0               1
8       12868123  80A9DCFC            0              16
1       12868123  84C0E502            1              11
2       12868123  387DB71C            1               1
327295  8617B7D9  76A08B0E            0              19
327296  8617B7D9  76A08B0E            0              19
327297  8617B7D9  76D0DA26            1               2
327298  8617B7D9  7C92B2A6            1               3

I tried looking for something along the line of df.groupby('ID1', 'Bool').random(size=n), but couldn't figure it out. 我试图沿着df.groupby（'ID1'，'Bool'）。random（size = n）寻找东西，但找不到。

2- I then want to calculate the average Count for each (ID1, Bool) pair. 2-然后我要计算每个(ID1, Bool)对的平均Count 。 So that the final resulting DF is: 这样最终得出的DF是：

            ID1        Bool           AverageCount
0       12868123         0              8.5
1       12868123         1              6
2       8617B7D9         0              19
3       8617B7D9         1              2.5

I think I have the second part figured out: 我想我已经弄清楚了第二部分：

df.groupby(['ID1','Bool'])['Count'].mean()

Answer 1

groupby + sample groupby + sample

df.groupby(
    ['ID1', 'Bool']
).apply(
    lambda df: df.sample(2).Count.mean()
).reset_index(name='AverageCount')

Answer 2

You can use groupby with numpy.random.choice : 您可以将groupby与numpy.random.choice使用：

n = 2
df1 = df.groupby(['ID1', 'Bool'])['Count'] \
        .apply(lambda x: np.mean(np.random.choice(x, n))) \
        .reset_index(name='AverageCount')
print (df1)
        ID1  Bool  AverageCount
0  12868123     0          18.5
1  12868123     1           6.0
2  8617B7D9     0          19.0
3  8617B7D9     1           3.0

Pandas Dataframe从分组中选择随机行，并找到每个分组的平均值

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-01-12 07:07:49

解决方案2
3 2017-01-12 07:12:31

Pandas Dataframe从分组中选择随机行，并找到每个分组的平均值

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-01-12 07:07:49

解决方案2 3 2017-01-12 07:12:31

解决方案1
3 已采纳 2017-01-12 07:07:49

解决方案2
3 2017-01-12 07:12:31