[英]how to calculate percentage and format table
I have a table looks like:我有一张桌子看起来像:
c_id soap_spend towel_spend year_spend cluster
c1 1 2 3 1
c2 2 4 6 2
c3 1 2 3 2
c4 3 2 5 1
I want to return two tables.我想返回两张桌子。
table 1:表格1:
cluster_1 cluster_2
% soap_spend a = (1+3)/(3+5) c= (2+1)/(6+3) -- soap_spend.sum/year_spend.sum
% towel_spend b = (2+2)/(3+5) d= (2+4)/(6+3) -- towel_spend.sum/year_spend.sum
table 2:表2:
use results from table 1
cluster_1 cluster_2
% soap_spend a/mean(soap_spend) c/mean(towel_spend)
% towel_spend b/mean(soap_spend) d/mean(towel_spend)
my code:我的代码:
cols = ['soap_spend', 'towel_spend']
df.groupby('cluster').apply(df[col].sum()/df['year_spend'].sum()
any suggestions on hwo to fix the code?关于如何修复代码的任何建议?
You don't need to group by cluster, it's enough to sum with loc operations like this:您不需要按集群分组,只需像这样对 loc 操作求和即可:
numerator = df['soap_spend'].loc[df['cluster'] == val].sum()
denominator = df['year_spend'].loc[df['cluster'] == val].sum()
The full code would be like this:完整的代码是这样的:
for val in df.cluster.unique():
soap_numerator = df['soap_spend'].loc[df['cluster'] == val].sum()
denominator = df['year_spend'].loc[df['cluster'] == val].sum()
towel_numerator = df['towel_spend'].loc[df['cluster'] == val].sum()
soap_spend = soap_numerator / denominator
towel_spend = towel_numerator / denominator
col = [soap_spend,towel_spend]
df_results.insert(int(val),'cluster_{}'.format(int(val)),col)
df_results = df_results[['cluster_1','cluster_2']]
For the table 2 you should apply that:对于表 2,您应该应用:
table_2 = table_1.copy()
for row in range(table_2.shape[0]):
for col in table_2.columns:
table_2[col].iloc[row] = table_2[col][row] / table_2.iloc[row].mean()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.