简体   繁体   English

Pandas DataFrame:计算行之间的百分比差异?

[英]Pandas DataFrame: Calculate percentage difference between rows?

I have a year wise dataframe with each year has three parameters year , type and value .我有一年明智的 dataframe ,每年都有三个参数yeartypevalue I'm trying to calculate percentage of taken vs empty.我正在尝试计算占用与空置的百分比。 For example year 2014 has total of 50 empty and 50 taken - So 50% in empty and 50% in taken as shown in final_df例如, 2014总共有50 empty的和50 taken的 - 所以 50% 的空和 50% 的占用,如 final_df 所示

df df

    year     type          value
            
0     2014  Total          100
1     2014  Empty           50
2     2014  Taken           50
3     2013  Total          2000
4     2013  Empty          100
5     2013  Taken          1900
6     2012  Total          50
7     2012  Empty          45
8     2012  Taken           5

Final df最终df

    year    Empty          Taken
            
0   2014    50             50 
0   2013    ...            ...    
0   2012    ...            ... 

Should i shift cells up and do the percentage calculate or any other method?我应该向上移动单元格并计算百分比还是任何其他方法?

You can use pivot_table :您可以使用pivot_table

new = df[df['type'] != 'Total']
res = (new.pivot_table(index='year',columns='type',values='value').sort_values(by='year',ascending=False).reset_index())

which gets you:这让你:

res
      year  Empty  Taken
0     2014     50     50
1     2013    100   1900
2     2012     45      5

And then you can get the percentages for each column:然后你可以得到每列的百分比:

total = (res['Empty'] + res['Taken'])
for col in ['Empty','Taken']:
    res[col+'_perc'] = res[col] / total


year  Empty  Taken  Empty_perc  Taken_perc                                     
2014     50     50        0.50        0.50
2013    100   1900        0.05        0.95
2012     45      5        0.90        0.10

As @sophods pointed out, you can use pivot_table to rearange your dataframe, however, to add to his answer;正如@sophods 指出的那样,您可以使用pivot_table重新排列您的 dataframe,但是,以添加到他的答案中; i think you're after the percentage, hence i suggest you keep the 'Total' record and then apply your calculation:我认为您追求的是百分比,因此我建议您保留“总计”记录,然后应用您的计算:

#pivot your data
res = (df.pivot_table(index='year',columns='type',values='value')).reset_index()
#calculate percentages of empty and taken
res['Empty'] = res['Empty']/res['Total']
res['Taken'] = res['Taken']/res['Total']
#final dataframe
res = res[['year', 'Empty', 'Taken']]

You can filter out records having Empty and Taken in type and then groupby year and apply func .您可以过滤掉type为 Empty 和 Taken 的记录,然后按年份groupby并应用func In func , you can set the type as index and then get the required values and calculate the percentage.func中,您可以将类型设置为索引,然后获取所需的值并计算百分比。 x in func would be dataframe having type and value columns and data per group. func 中的 x 将是 dataframe ,每组具有typevalue列和数据。

 def func(x):
    x = x.set_index('type')
    total = x['value'].sum()
    return [(x.loc['Empty', 'value']/total)*100, (x.loc['Taken', 'value']/total)*100]

temp = (df[df['type'].isin({'Empty', 'Taken'})]
        .groupby('year')[['type', 'value']]
        .apply(lambda x: func(x)))
temp

year
2012    [90.0, 10.0]
2013    [5.0, 95.0] 
2014    [50.0, 50.0]
dtype: object

Convert the result into the required dataframe将结果转换为所需的 dataframe

pd.DataFrame(temp.values.tolist(), index=temp.index, columns=['Empty', 'Taken'])
       Empty    Taken
year        
2012    90.0    10.0
2013    5.0     95.0
2014    50.0    50.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM