[英]Pandas Dataframe select random rows from grouping, and finding average of each grouping
I have a dataframe df that looks like this: 我有一个看起来像这样的数据框df:
ID1 ID2 Bool Count
0 12868123 387DB71C 0 1
1 12868123 84C0E502 1 11
2 12868123 387DB71C 1 1
8 12868123 80A9DCFC 0 16
9 12868123 7A260136 1 20
10 12868123 80A9DCFC 0 16
11 12868123 80BB4591 0 36
327295 8617B7D9 76A08B0E 0 19
327296 8617B7D9 76A08B0E 0 19
327297 8617B7D9 76D0DA26 1 2
327298 8617B7D9 7C92B2A6 1 3
327299 8617B7D9 75883296 1 1
327300 8617B7D9 78711A4F 0 12
327301 8617B7D9 78711A4F 0 12
327302 8617B7D9 78711A4F 0 12
I want to do two things: 我想做两件事:
1- I want to "randomly" extract n
unique rows for each (ID1, Bool)
instance. 1-我想为每个(ID1, Bool)
实例“随机”提取n
唯一的行。 So if n = 2
, one possible result could be: 因此,如果n = 2
,则一种可能的结果可能是:
ID1 ID2 Bool Count
0 12868123 387DB71C 0 1
8 12868123 80A9DCFC 0 16
1 12868123 84C0E502 1 11
2 12868123 387DB71C 1 1
327295 8617B7D9 76A08B0E 0 19
327296 8617B7D9 76A08B0E 0 19
327297 8617B7D9 76D0DA26 1 2
327298 8617B7D9 7C92B2A6 1 3
I tried looking for something along the line of df.groupby('ID1', 'Bool').random(size=n), but couldn't figure it out. 我试图沿着df.groupby('ID1','Bool')。random(size = n)寻找东西,但找不到。
2- I then want to calculate the average Count
for each (ID1, Bool)
pair. 2-然后我要计算每个(ID1, Bool)
对的平均Count
。 So that the final resulting DF is: 这样最终得出的DF是:
ID1 Bool AverageCount
0 12868123 0 8.5
1 12868123 1 6
2 8617B7D9 0 19
3 8617B7D9 1 2.5
I think I have the second part figured out: 我想我已经弄清楚了第二部分:
df.groupby(['ID1','Bool'])['Count'].mean()
You can use groupby
with numpy.random.choice
: 您可以将groupby
与numpy.random.choice
使用:
n = 2
df1 = df.groupby(['ID1', 'Bool'])['Count'] \
.apply(lambda x: np.mean(np.random.choice(x, n))) \
.reset_index(name='AverageCount')
print (df1)
ID1 Bool AverageCount
0 12868123 0 18.5
1 12868123 1 6.0
2 8617B7D9 0 19.0
3 8617B7D9 1 3.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.