So I have been trying to use pandas to create a DataFrame that reports the number of graduates working at jobs that do require college degrees ('college_jobs'), and do not require college degrees ('non_college_jobs'). note: the name of the dataframe I am dealing with is recent_grads
I tried the following code:
df1 = recent_grads.groupby(['major_category']).college_jobs.non_college_jobs.sum()
or
df1 = recent_grads.groupby(['major_category']).recent_grads['college_jobs','non_college_jobs'].sum()
or
df1 = recent_grads.groupby(['major_category']).recent_grads['college_jobs'],['non_college_jobs'].sum()
none of them worked? what am I supposed to do? can somebody give me a simple explanation regarding this. I had been trying to read through pandas documentations and did not find the explanation wanted.
here is the head of the dataframe:
rank major_code major major_category \
0 1 2419 PETROLEUM ENGINEERING Engineering
1 2 2416 MINING AND MINERAL ENGINEERING Engineering
2 3 2415 METALLURGICAL ENGINEERING Engineering
3 4 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING Engineering
4 5 2405 CHEMICAL ENGINEERING Engineering
total sample_size men women sharewomen employed ... \
0 2339 36 2057 282 0.120564 1976 ...
1 756 7 679 77 0.101852 640 ...
2 856 3 725 131 0.153037 648 ...
3 1258 16 1123 135 0.107313 758 ...
4 32260 289 21239 11021 0.341631 25694 ...
part_time full_time_year_round unemployed unemployment_rate median \
0 270 1207 37 0.018381 110000
1 170 388 85 0.117241 75000
2 133 340 16 0.024096 73000
3 150 692 40 0.050125 70000
4 5180 16697 1672 0.061098 65000
p25th p75th college_jobs non_college_jobs low_wage_jobs
0 95000 125000 1534 364 193
1 55000 90000 350 257 50
2 50000 105000 456 176 0
3 43000 80000 529 102 0
4 50000 75000 18314 4440 972
[5 rows x 21 columns]
You could filter the initial DataFrame by the columns you're interested in and then perform the groupby and summation as below:
recent_grads[['major_category', 'college_jobs', 'non_college_jobs']].groupby('major_category').sum()
Conversely, if you don't perform the initial column filter and then do a .sum()
on the recent_grads.groupby('major_category')
it will be applied to all numeric columns possible.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.