简体   繁体   English

Python Count 组内数据框中唯一值的数量

[英]Python Count Number of Unique Values within Data frame within a group

I have a data frame named 'sal' that contains salary information for employees across a number of years.我有一个名为“sal”的数据框,其中包含多年来员工的工资信息。

I am trying to calculate the number of job titles that were represented by only one person, in the year 2013. I know, via a manual check the answer to this is 202.我正在尝试计算 2013 年仅由一个人代表的职位数量。我知道,通过手动检查,答案是 202。

I'm using the following method:我正在使用以下方法:

sal[sal['Year'] == 2013]['JobTitle'].nunique()

Data Sample:数据样本:

    Id  EmployeeName    JobTitle    BasePay OvertimePay OtherPay    Benefits    TotalPay    TotalPayBenefits    Year    Notes   Agency  Status
72926   Gregory P Suhr  Chief of Police 319275.01   0   20007.06    86533.21    339282.07   425815.28   2013        San Francisco   
72927   Joanne M Hayes-White    Chief, Fire Department  313686.01   0   23236   85431.39    336922.01   422353.4    2013        San Francisco   
72928   Samson  Lai Battalion Chief, Fire Suppress  186236.42   131217.63   29648.27    57064.95    347102.32   404167.27   2013        San Francisco   
72929   Ellen G Moffatt Asst Med Examiner   272855.51   23727.91    38954.54    66198.92    335537.96   401736.88   2013        San Francisco   
72930   Robert L Shaw   Dep Dir for Investments, Ret    315572.01   0   0   82849.66    315572.01   398421.67   2013        San Francisco   
72931   David L Franklin    Asst Chf of Dept (Fire Dept)    215265.6    87985.24    30637.48    62890.36    333888.32   396778.68   2013        San Francisco   
72932   Harlan L Kelly-Jr   Executive Contract Employee 313312.52   0   0   82319.51    313312.52   395632.03   2013        San Francisco   
72933   John L Martin   Dept Head V 311758.96   0   1098.64 82476.85    312857.6    395334.45   2013        San Francisco   
72934   Edward D Reiskin    Gen Mgr, Public Trnsp Dept  305307.89   0   0   80860.6 305307.89   386168.49   2013        San Francisco   
72935   Thomas A Siragusa   Asst Chf of Dept (Fire Dept)    215265.6    88028.54    21526.49    61288.58    324820.63   386109.21   2013        San Francisco   
72936   Amy P Hart  Dept Head V 286480.44   0   17188.71    80077.63    303669.15   383746.78   2013        San Francisco   
72937   Yifang  Qian    Senior Physician Specialist 203710  0   119176.84   58810.96    322886.84   381697.8    2013        San Francisco   
72938   Michael J Biel  Deputy Chief 3  278964  0   17587.86    77708.48    296551.86   374260.34   2013        San Francisco   
72939   Raymond A Guzman    Dep Chf of Dept (Fire Dept) 270756.03   0   24181.02    77474.92    294937.05   372411.97   2013        San Francisco   
72940   Marty A Ross    Battalion Chief, Fire Suppress  186236.43   88345.08    38035.09    58991.75    312616.6    371608.35   2013        San Francisco   
72941   Mark A Gonzales Dep Chf of Dept (Fire Dept) 270756.01   0   20236.5 77408.16    290992.51   368400.67   2013        San Francisco   
72942   Mark J Johnson  Battalion Chief, Fire Suppress  186236.41   101466.96   23994.92    56134.3 311698.29   367832.59   2013        San Francisco   
72943   Bryan W Rubenstein  Battalion Chief, Fire Suppress  186236.45   94450.92    30313.49    56508.46    311000.86   367509.32   2013        San Francisco   
72944   Gary L Altenberg    Lieutenant, Fire Suppression    135903.02   163477.81   20994.96    46030.76    320375.79   366406.55   2013        San Francisco   
72945   John J Loftus   Deputy Chief 3  274126.5    0   13358.1 75909.1 287484.6    363393.7    2013        San Francisco   
72946   Edwin M Lee Mayor   285446.37   0   0   77105.29    285446.37   362551.66   2013        San Francisco   
72947   Michael J Morris    Assistant Deputy Chief 2    124054  0   202322.37   35929.84    326376.37   362306.21   2013        San Francisco   
72948   David  Shinn    Deputy Chief 3  278964  0   6428.79 76680.57    285392.79   362073.36   2013        San Francisco   
72949   Arthur W Kenney Asst Chf of Dept (Fire Dept)    213308.64   49139.25    36262.42    60756.95    298710.31   359467.26   2013        San Francisco   
72950   Lorrie A Kalos  Battalion Chief, Fire Suppress  186236.49   87457.68    28003.53    57030.95    301697.7    358728.65   2013        San Francisco   
72951   Lyn  Tomioka    Deputy Chief 3  278964  0   3536.35 76113.13    282500.35   358613.48   2013        San Francisco   
72952   Denise A Schmitt    Deputy Chief 3  278964  0   3536.39 75367.15    282500.39   357867.54   2013        San Francisco   
72953   Rudy J Castellanos  Battalion Chief, Fire Suppress  186236.42   94274.25    19022.95    55351.53    299533.62   354885.15   2013        San Francisco   
72954   Susan  Currin   Adm, SFGH Medical Center    271831.5    0   5000    75511.72    276831.5    352343.22   2013        San Francisco   
72955   Thomas F Abbott Battalion Chief, Fire Suppress  186236.41   84382.38    23279.44    56184.01    293898.23   350082.24   2013        San Francisco   
72956   Naomi M Kelly   Dept Head V 270641.5    0   3000    74867.87    273641.5    348509.37   2013        San Francisco   
72957   Trent E Rhorer  Dept Head V 270641.56   0   3000    74769.34    273641.56   348410.9    2013        San Francisco   
72958   Barbara A Garcia    Dept Head V 270591.04   0   3050.5  74769.33    273641.54   348410.87   2013        San Francisco   
72959   Robert F Postel Asst Chf of Dept (Fire Dept)    212244.54   62490.6 13450.16    58778.57    288185.3    346963.87   2013        San Francisco   
72960   Jeffrey J Barden    Captain, Fire Suppression   155174.49   124293.83   18151.93    49001.55    297620.25   346621.8    2013        San Francisco   

which is returing an incorrect answer of 1051. Could someone explain why the logic I have used is incorrect and an alternate method?这是返回 1051 的错误答案。有人可以解释为什么我使用的逻辑不正确和替代方法吗?

Thanks!!!谢谢!!!

So to answer the question I had my logic wrong:所以要回答这个问题,我的逻辑是错误的:

sal[sal['Year'] == 2013]['JobTitle'].nunique()

will count the number of unique job titles.将计算唯一职位的数量。 So if there are 10 people with the job title 'Engineer' it will only count once.因此,如果有 10 个人的头衔为“工程师”,则只会计算一次。

The answer I was looking for was 'the number of job titles that were represented by only one person';我正在寻找的答案是“只有一个人代表的职位数量”;

which I found using the solution:我发现使用解决方案:

 sum(sal[sal['Year']==2013]['JobTitle'].value_counts()==1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM