[英]Pandas DataFrame: Calculate percentage difference between rows?
I have a year wise dataframe with each year has three parameters year
, type
and value
.我有一年明智的 dataframe ,每年都有三个参数
year
, type
和value
。 I'm trying to calculate percentage of taken vs empty.我正在尝试计算占用与空置的百分比。 For example year
2014
has total of 50 empty
and 50 taken
- So 50% in empty and 50% in taken as shown in final_df例如,
2014
总共有50 empty
的和50 taken
的 - 所以 50% 的空和 50% 的占用,如 final_df 所示
df df
year type value
0 2014 Total 100
1 2014 Empty 50
2 2014 Taken 50
3 2013 Total 2000
4 2013 Empty 100
5 2013 Taken 1900
6 2012 Total 50
7 2012 Empty 45
8 2012 Taken 5
Final df最终df
year Empty Taken
0 2014 50 50
0 2013 ... ...
0 2012 ... ...
Should i shift cells up and do the percentage calculate or any other method?我应该向上移动单元格并计算百分比还是任何其他方法?
You can use pivot_table
:您可以使用
pivot_table
:
new = df[df['type'] != 'Total']
res = (new.pivot_table(index='year',columns='type',values='value').sort_values(by='year',ascending=False).reset_index())
which gets you:这让你:
res
year Empty Taken
0 2014 50 50
1 2013 100 1900
2 2012 45 5
And then you can get the percentages for each column:然后你可以得到每列的百分比:
total = (res['Empty'] + res['Taken'])
for col in ['Empty','Taken']:
res[col+'_perc'] = res[col] / total
year Empty Taken Empty_perc Taken_perc
2014 50 50 0.50 0.50
2013 100 1900 0.05 0.95
2012 45 5 0.90 0.10
As @sophods pointed out, you can use pivot_table
to rearange your dataframe, however, to add to his answer;正如@sophods 指出的那样,您可以使用
pivot_table
重新排列您的 dataframe,但是,以添加到他的答案中; i think you're after the percentage, hence i suggest you keep the 'Total' record and then apply your calculation:我认为您追求的是百分比,因此我建议您保留“总计”记录,然后应用您的计算:
#pivot your data
res = (df.pivot_table(index='year',columns='type',values='value')).reset_index()
#calculate percentages of empty and taken
res['Empty'] = res['Empty']/res['Total']
res['Taken'] = res['Taken']/res['Total']
#final dataframe
res = res[['year', 'Empty', 'Taken']]
You can filter out records having Empty and Taken in type
and then groupby
year and apply func
.您可以过滤掉
type
为 Empty 和 Taken 的记录,然后按年份groupby
并应用func
。 In func
, you can set the type as index and then get the required values and calculate the percentage.在
func
中,您可以将类型设置为索引,然后获取所需的值并计算百分比。 x in func would be dataframe having type
and value
columns and data per group. func 中的 x 将是 dataframe ,每组具有
type
和value
列和数据。
def func(x):
x = x.set_index('type')
total = x['value'].sum()
return [(x.loc['Empty', 'value']/total)*100, (x.loc['Taken', 'value']/total)*100]
temp = (df[df['type'].isin({'Empty', 'Taken'})]
.groupby('year')[['type', 'value']]
.apply(lambda x: func(x)))
temp
year
2012 [90.0, 10.0]
2013 [5.0, 95.0]
2014 [50.0, 50.0]
dtype: object
Convert the result into the required dataframe将结果转换为所需的 dataframe
pd.DataFrame(temp.values.tolist(), index=temp.index, columns=['Empty', 'Taken'])
Empty Taken
year
2012 90.0 10.0
2013 5.0 95.0
2014 50.0 50.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.