[英]Calculating mean grades for deciles within a dataset with Python, grouped by another field
import pandas as pd
import csv
df_orig = pd.read_csv('test_sample.csv')
df_orig = df_orig[(df_orig['number']>0)]
decile_stats = df_orig.groupby(pd.qcut(df_orig.number, 5))['number'].mean()
print(decile_stats)
I'm trying to use python to calculate statistics for deciles of my dataset.我正在尝试使用 python 来计算我的数据集的十分位数的统计数据。 I can calculate the mean of each decile using qcut, but I want to group my numbers by the values in a second column.我可以使用 qcut 计算每个十分位数的平均值,但我想根据第二列中的值对我的数字进行分组。 This way the deciles are calculated and reported on values according to their value in the family column.通过这种方式,可以根据家庭列中的值计算十分位数并报告值。
family number家庭号码
0 1000 0.04 0 1000 0.04
1 1000 0.20 1 1000 0.20
2 1000 0.04 2 1000 0.04
3 1000 0.16 3 1000 0.16
4 1000 0.08 4 1000 0.08
5 1000 0.02 5 1000 0.02
6 1000 0.02 6 1000 0.02
7 1000 0.02 7 1000 0.02
8 1000 0.64 8 1000 0.64
9 1000 0.04 9 1000 0.04
My desired output would be:我想要的 output 是:
Q1 1000 0.028617 Q1 1000 0.028617
Q2 1000 0.105060 Q2 1000 0.105060
Q3 1000 0.452467 Q3 1000 0.452467
Q4 1000 2.644886 Q4 1000 2.644886
Q5 1000 141.749797... Q5 1000 141.749797...
etc. with each 'family' shown, 1000, 2000, 3000 in this case.等等,显示每个“家庭”,在这种情况下为 1000、2000、3000。
IIUC, you can use: IIUC,你可以使用:
N = 3
labels = [f'Q{i}' for i in range(1, N+1)]
decile = lambda x: x.groupby(pd.qcut(x['number'], N, labels=labels)).mean()
out = df.groupby('family').apply(decile)['number'].rename('mean').reset_index()
Output: Output:
>>> out
family number mean
0 1000 Q1 0.030000
1 1000 Q2 0.080000
2 1000 Q3 0.333333
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.