[英]Compute difference between rows in pandas dataframe
I'd like to compute the difference between two categories in a dataframe.我想计算数据框中两个类别之间的差异。 For example, in the following case, I want to compute the differences between male and female on each job.例如,在下面的例子中,我想计算每个工作中男性和女性之间的差异。 However, there are some jobs done by only male or female.但是,有些工作仅由男性或女性完成。 What is an efficient way to do it?什么是有效的方法? Thanks.谢谢。
import pandas as pd
pd.DataFrame({'job': ['a', 'a', 'b', 'b', 'c'], 'gender':['M', 'F', 'M', 'F', 'M'], 'income':[300, 200, 450, 400, 350]})
Out[3]:
gender income job
0 M 300 a
1 F 200 a
2 M 450 b
3 F 400 b
4 M 350 c
You could do a pivot such that the male and female pay for the same job are on the same row.您可以做一个支点,使同一工作的男性和女性薪酬在同一行。 Then you can visually compare, or run other row-based code.然后您可以直观地比较或运行其他基于行的代码。
import pandas as pd
df = pd.DataFrame({'job': ['a', 'a', 'b', 'b', 'c'], 'gender':['M', 'F', 'M', 'F', 'M'], 'income':[300, 200, 450, 400, 350]})
compare_income_by_gender_df = df.pivot(index='job', columns='gender', values='income')
print compare_income_by_gender_df
resulting in导致
python pivot.py
gender F M
job
a 200 300
b 400 450
c NaN 350
see also: Pandas Reshaping and Pivot Tables另见:熊猫重塑和数据透视表
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.