[英]What is the lambda function to compute mean of each column/row in a dataframe (pandas)?
[英]Compute percentage for each row in pandas dataframe
country_name country_code val_code \
United States of America 231 1
United States of America 231 2
United States of America 231 3
United States of America 231 4
United States of America 231 5
y191 y192 y193 y194 y195 \
47052179 43361966 42736682 43196916 41751928
1187385 1201557 1172941 1176366 1192173
28211467 27668273 29742374 27543836 28104317
179000 193000 233338 276639 249688
12613922 12864425 13240395 14106139 15642337
在上面的数据框中,我想为每一行计算该val_code占用的总百分比,从而得到foll。 数据框。
即总结每一行并除以所有行的总和
country_name country_code val_code \
United States of America 231 1
United States of America 231 2
United States of America 231 3
United States of America 231 4
United States of America 231 5
perc
50.14947129
1.363631254
32.48344744
0.260213146
15.74323688
现在,我这样做,但它没有用
grp_df = df.groupby(['country_name', 'val_code']).agg()
pct_df = grp_df.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))
您可以使用lambda
函数获取每列的百分比,如下所示:
>>> df.iloc[:, 3:].apply(lambda x: x / x.sum())
y191 y192 y193 y194 y195
0 0.527231 0.508411 0.490517 0.500544 0.480236
1 0.013305 0.014088 0.013463 0.013631 0.013713
2 0.316116 0.324405 0.341373 0.319164 0.323259
3 0.002006 0.002263 0.002678 0.003206 0.002872
4 0.141342 0.150833 0.151969 0.163455 0.179920
您的示例没有val_code
任何重复值,因此我不确定您希望数据显示的方式(即显示每个v val_code
组的列总数与总数的val_code
。)
Ge所有感兴趣的列的总数,然后添加百分比列:
In [35]:
total = np.sum(df.ix[:,'y191':].values)
df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100
df
Out[35]:
country_name country_code val_code y191 y192 \
0 United States of America 231 1 47052179 43361966
1 United States of America 231 1 1187385 1201557
2 United States of America 231 1 28211467 27668273
3 United States of America 231 1 179000 193000
4 United States of America 231 1 12613922 12864425
y193 y194 y195 percent
0 42736682 43196916 41751928 50.149471
1 1172941 1176366 1192173 1.363631
2 29742374 27543836 28104317 32.483447
3 233338 276639 249688 0.260213
4 13240395 14106139 15642337 15.743237
所以np.sum
会将所有值相加:
In [32]:
total = np.sum(df.ix[:,'y191':].values)
total
Out[32]:
434899243
然后我们在感兴趣的cols上调用.sum(axis=1)/total * 100
来逐行求和,除以total并乘以100得到一个百分比。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.