[英]How do I use pandas groupby() to show the value of 2 things per one column?
So I have been trying to use pandas to create a DataFrame that reports the number of graduates working at jobs that do require college degrees ('college_jobs'), and do not require college degrees ('non_college_jobs').所以我一直在尝试使用 pandas 创建一个 DataFrame,它报告在需要大学学位('college_jobs')和不需要大学学位('non_college_jobs')的工作中工作的毕业生人数。 note: the name of the dataframe I am dealing with is recent_grads注意:我正在处理的 dataframe 的名称是recent_grads
I tried the following code:我尝试了以下代码:
df1 = recent_grads.groupby(['major_category']).college_jobs.non_college_jobs.sum()
or或者
df1 = recent_grads.groupby(['major_category']).recent_grads['college_jobs','non_college_jobs'].sum()
or或者
df1 = recent_grads.groupby(['major_category']).recent_grads['college_jobs'],['non_college_jobs'].sum()
none of them worked?他们都没有工作? what am I supposed to do?我应该做些什么? can somebody give me a simple explanation regarding this.有人可以给我一个简单的解释。 I had been trying to read through pandas documentations and did not find the explanation wanted.我一直在尝试通读 pandas 文档,但没有找到想要的解释。
here is the head of the dataframe:这是dataframe的头:
rank major_code major major_category \
0 1 2419 PETROLEUM ENGINEERING Engineering
1 2 2416 MINING AND MINERAL ENGINEERING Engineering
2 3 2415 METALLURGICAL ENGINEERING Engineering
3 4 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING Engineering
4 5 2405 CHEMICAL ENGINEERING Engineering
total sample_size men women sharewomen employed ... \
0 2339 36 2057 282 0.120564 1976 ...
1 756 7 679 77 0.101852 640 ...
2 856 3 725 131 0.153037 648 ...
3 1258 16 1123 135 0.107313 758 ...
4 32260 289 21239 11021 0.341631 25694 ...
part_time full_time_year_round unemployed unemployment_rate median \
0 270 1207 37 0.018381 110000
1 170 388 85 0.117241 75000
2 133 340 16 0.024096 73000
3 150 692 40 0.050125 70000
4 5180 16697 1672 0.061098 65000
p25th p75th college_jobs non_college_jobs low_wage_jobs
0 95000 125000 1534 364 193
1 55000 90000 350 257 50
2 50000 105000 456 176 0
3 43000 80000 529 102 0
4 50000 75000 18314 4440 972
[5 rows x 21 columns]
You could filter the initial DataFrame by the columns you're interested in and then perform the groupby and summation as below:您可以按您感兴趣的列过滤初始 DataFrame,然后执行 groupby 和 summation,如下所示:
recent_grads[['major_category', 'college_jobs', 'non_college_jobs']].groupby('major_category').sum()
Conversely, if you don't perform the initial column filter and then do a .sum()
on the recent_grads.groupby('major_category')
it will be applied to all numeric columns possible.相反,如果您不执行初始列过滤器,然后对recent_grads.groupby('major_category')
执行 .sum .sum()
) ,它将应用于所有可能的数字列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.