简体   繁体   English

R中的聚合 - na.omit和na.pass与因子(逐个因子)?

[英]Aggregate - na.omit and na.pass in R with factor (group by factor)?

I have a data set containing salaries test data. 我有一个包含工资测试数据的数据集。 Not all cells have values hence I used na.action=na.pass,na.rm=TRUE but it gives me an error due to the fact that I want to aggregate with JobTitle which is factor? 并非所有单元格都有值,因此我使用na.action = na.pass,na.rm = TRUE但由于我想与JobTitle聚合这是因素,它给了我一个错误?

So far I have developed below code: 到目前为止,我开发了以下代码:

aggregate(salaries$JobTitle, 
list(pay = salaries$TotalPay),
FUN=mean,
na.action=na.pass,
na.rm=TRUE)

My test data has the following columns: 我的测试数据包含以下列:

'data.frame':   104 obs. of  36 variables:
 $ Id              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ EmployeeName    : Factor w/ 11 levels "","ALBERT PARDINI",..: 10 7 2 4 11 6 3 5 9 8 ...
 $ JobTitle        : Factor w/ 9 levels "","ASSISTANT DEPUTY CHIEF II",..: 8 4 4 9 6 2 3 7 3 5 ...
 $ BasePay         : num  167411 155966 212739 77916 134402 ...
 $ OvertimePay     : num  0 245132 106088 56121 9737 ...
 $ OtherPay        : num  400184 137811 16453 198307 182235 ...
 $ Benefits        : logi  NA NA NA NA NA NA ...
 $ TotalPay        : num  567595 538909 335280 332344 326373 ...
 $ TotalPayBenefits: num  567595 538909 335280 332344 326373 ...
 $ Year            : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
 $ Notes           : logi  NA NA NA NA NA NA ...
 $ Agency          : Factor w/ 2 levels "","San Francisco": 2 2 2 2 2 2 2 2 2 2 ..

The error code which comes up is 出现的错误代码是

Warning messages:
1: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA

etc... 等等...

I have tried with salaries$Id and it work like magic so I assume the code is correct and perhaps I need to change the data type for JobTitle? 我已经尝试过工资$ Id并且它像魔术一样工作,所以我假设代码是正确的,也许我需要更改JobTitle的数据类型?

If we are getting the mean of 'TotalPay grouped by 'JobTitle', the formula` method would be 如果我们得到'TotalPay grouped by 'JobTitle', themean grouped by 'JobTitle', the公式`方法就是

aggregate(TotalPay~JobTitle, salaries, mean, na.rm=TRUE, na.action=na.pass)

Or use 或者使用

aggregate(salaries$TotalPay, list(salaries$JobTitle), FUN=mean, na.rm=TRUE) 

data 数据

set.seed(24)
salaries <- data.frame(JobTitle = sample(LETTERS[1:5], 20,
       replace=TRUE), TotalPay= sample(c(1:20, NA), 20))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM