如何使用 pandas groupby() 显示每列 2 个事物的值？

Question

So I have been trying to use pandas to create a DataFrame that reports the number of graduates working at jobs that do require college degrees ('college_jobs'), and do not require college degrees ('non_college_jobs').所以我一直在尝试使用 pandas 创建一个 DataFrame，它报告在需要大学学位（'college_jobs'）和不需要大学学位（'non_college_jobs'）的工作中工作的毕业生人数。 note: the name of the dataframe I am dealing with is recent_grads注意：我正在处理的 dataframe 的名称是recent_grads

I tried the following code:我尝试了以下代码：

df1 = recent_grads.groupby(['major_category']).college_jobs.non_college_jobs.sum()

or或者

df1 = recent_grads.groupby(['major_category']).recent_grads['college_jobs','non_college_jobs'].sum()

or或者

df1 = recent_grads.groupby(['major_category']).recent_grads['college_jobs'],['non_college_jobs'].sum()

none of them worked?他们都没有工作？ what am I supposed to do?我应该做些什么？ can somebody give me a simple explanation regarding this.有人可以给我一个简单的解释。 I had been trying to read through pandas documentations and did not find the explanation wanted.我一直在尝试通读 pandas 文档，但没有找到想要的解释。

here is the head of the dataframe:这是dataframe的头：

   rank  major_code                                      major major_category  \
0     1        2419                      PETROLEUM ENGINEERING    Engineering   
1     2        2416             MINING AND MINERAL ENGINEERING    Engineering   
2     3        2415                  METALLURGICAL ENGINEERING    Engineering   
3     4        2417  NAVAL ARCHITECTURE AND MARINE ENGINEERING    Engineering   
4     5        2405                       CHEMICAL ENGINEERING    Engineering   

   total  sample_size    men  women  sharewomen  employed      ...        \
0   2339           36   2057    282    0.120564      1976      ...         
1    756            7    679     77    0.101852       640      ...         
2    856            3    725    131    0.153037       648      ...         
3   1258           16   1123    135    0.107313       758      ...         
4  32260          289  21239  11021    0.341631     25694      ...         

   part_time  full_time_year_round  unemployed  unemployment_rate  median  \
0        270                  1207          37           0.018381  110000   
1        170                   388          85           0.117241   75000   
2        133                   340          16           0.024096   73000   
3        150                   692          40           0.050125   70000   
4       5180                 16697        1672           0.061098   65000   

   p25th   p75th college_jobs  non_college_jobs  low_wage_jobs  
0  95000  125000         1534               364            193  
1  55000   90000          350               257             50  
2  50000  105000          456               176              0  
3  43000   80000          529               102              0  
4  50000   75000        18314              4440            972  

[5 rows x 21 columns]

Answer 1

You could filter the initial DataFrame by the columns you're interested in and then perform the groupby and summation as below:您可以按您感兴趣的列过滤初始 DataFrame，然后执行 groupby 和 summation，如下所示：

recent_grads[['major_category', 'college_jobs', 'non_college_jobs']].groupby('major_category').sum()

Conversely, if you don't perform the initial column filter and then do a .sum() on the recent_grads.groupby('major_category') it will be applied to all numeric columns possible.相反，如果您不执行初始列过滤器，然后对recent_grads.groupby('major_category')执行 .sum .sum() ) ，它将应用于所有可能的数字列。

如何使用 pandas groupby() 显示每列 2 个事物的值？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-24 10:48:57

如何使用 pandas groupby() 显示每列 2 个事物的值？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-24 10:48:57

解决方案1
1 已采纳 2020-06-24 10:48:57