简体   繁体   English

如何在Oracle SQL Developer中计算标准偏差?

[英]How to calculate Standard Deviation in Oracle SQL Developer?

I have a table employees, 我有桌子员工

CREATE TABLE employees (
employeeid NUMERIC(9) NOT NULL,
firstname VARCHAR(10),
lastname VARCHAR(20),
deptcode CHAR(5),
salary NUMERIC(9, 2),
  PRIMARY KEY(employeeid)
);

and I want to calculate Standard Deviation for salary. 我想计算薪水的标准差。

This is the code I am using: 这是我正在使用的代码:

select avg(salary) as mean, sqrt(sum((salary-avg(salary))*(salary-avg(salary)))/count(employeeid)) as SD 
from employees
group by employeeid;

I am getting this error: 我收到此错误:

ORA-00979: not a GROUP BY expression
00979. 00000 -  "not a GROUP BY expression"
*Cause:    
*Action:
Error at Line: 260 Column: 12

Line 260 Column 12 is avg(salary) 第260行第12栏的平均值是(工资)

How can I sort this out? 我该如何解决呢?

Oracle has a built-in function to calculate standard deviation: STDDEV . Oracle具有一个内置函数来计算标准偏差: STDDEV

The usage is as you'd expect for any aggregate function. 用法与任何聚合函数一样。

select stddev(salary) 
from employees;

I'd just use the stddev function 我只是用stddev函数

SELECT avg(salary) as mean, 
       stddev(salary) as sd
  FROM employees

It doesn't make sense to group by employeeid since that is, presumably unique. group by employeeid ID group by employeeid是没有意义的,因为这可能是唯一的。 It doesn't make sense to talk about the average salary by employee, you want the average salary across all employees (or all departments or some other aggregatable unit) 谈论员工的平均工资是没有意义的,您想要所有员工(或所有部门或其他可汇总单位)的平均工资。

The salary-avg(salary) can't be evaluated; salary-avg(salary)无法评估; avg(salary) is not available during execution of the query but only after all records are retrieved. avg(salary)在查询执行期间不可用,只有在检索到所有记录之后才可用。

I would suggest to add AVG calculations in a subquery and JOIN it to the main one 我建议在子查询中添加AVG计算并将其加入到主查询中

select avg(salary) as mean, 
      sqrt(sum((salary-avg_res.avg)*(salary-avg_res.avg))/count(employeeid)) as SD 
from employees JOIN
     (select employeeid,avg(salary) as avg
      from employees 
      group by employeeid) avg_res ON employees.employeeid=avg_res.employeeid
group by employeeid;

I thought you had to include the column in the GROUP BY in the SELECT: 我以为您必须在SELECT的GROUP BY中包括该列:

select employeeid, avg(salary) as mean, sqrt(sum((salary-avg(salary))*(salary-avg(salary)))/count(employeeid)) as SD 
from employees
group by employeeid;

But on further reflection the query doesn't make much sense unless it's historical data. 但是,经过进一步思考,除非是历史数据,否则查询就没有多大意义。 An employee id ought to be unique to a single employee. 一个员工ID应该是单个员工唯一的。 Unless this is an average over time there should be only one salary per employee. 除非这是一段时间内的平均水平,否则每个员工只能获得一份工资。 Your mean will be the salary and the standard deviation will be zero. 您的平均数将是薪水,标准差将为零。

A better query might be average of all salaries. 更好的查询可能是所有薪水的平均值。 In that case, remove the GROUP BY. 在这种情况下,请删除GROUP BY。

One more nitpick: the formula you're using is more properly called the population standard deviation. 另一种选择:您正在使用的公式更恰当地称为总体标准偏差。 The sample deviation divides by (n-1) . 样本偏差除以(n-1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM