简体   繁体   English

计算熊猫数据框中行之间的差异

[英]Compute difference between rows in pandas dataframe

I'd like to compute the difference between two categories in a dataframe.我想计算数据框中两个类别之间的差异。 For example, in the following case, I want to compute the differences between male and female on each job.例如,在下面的例子中,我想计算每个工作中男性和女性之间的差异。 However, there are some jobs done by only male or female.但是,有些工作仅由男性或女性完成。 What is an efficient way to do it?什么是有效的方法? Thanks.谢谢。

import pandas as pd

pd.DataFrame({'job': ['a', 'a', 'b', 'b', 'c'], 'gender':['M', 'F', 'M', 'F', 'M'], 'income':[300, 200, 450, 400, 350]})

Out[3]: 
  gender  income job
0      M     300   a
1      F     200   a
2      M     450   b
3      F     400   b
4      M     350   c

You could do a pivot such that the male and female pay for the same job are on the same row.您可以做一个支点,使同一工作的男性和女性薪酬在同一行。 Then you can visually compare, or run other row-based code.然后您可以直观地比较或运行其他基于行的代码。

import pandas as pd

df = pd.DataFrame({'job': ['a', 'a', 'b', 'b', 'c'], 'gender':['M', 'F', 'M', 'F', 'M'], 'income':[300, 200, 450, 400, 350]})

compare_income_by_gender_df = df.pivot(index='job', columns='gender', values='income')

print compare_income_by_gender_df

resulting in导致

python pivot.py
gender    F    M
job             
a       200  300
b       400  450
c       NaN  350

see also: Pandas Reshaping and Pivot Tables另见:熊猫重塑和数据透视表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM