[英]Rails and queries - Custom query to get duplicated records
For the most part I try not to write custom SQL queries with my apps but I came acros a case and I was wondering if this would be one those cases where I would be better off doing it. 在大多数情况下,我尽量不使用我的应用程序编写自定义SQL查询,但我错了一个案子,我想知道这是否是我最好做的情况。
I'm using PostgreSQL with this specific app. 我在这个特定的应用程序中使用PostgreSQL。 I want to return only employees that have been double booked for a task here are my models.
我只想返回被双重预定任务的员工,这是我的模型。
I have the following models bellow. 我有以下型号的波纹管。
User 用户
has_many :user_jobs
Fields
- id
- name
- address
- phone
Jobs 工作
has_many :user_jobs
has_many :users, through: :user_jobs
Fields
- id
- date
- start_time
- end_time
UserJobs UserJobs
belongs_to :user
belongs_to :jobs
Fields
- id
- job_id
- user_id
Userjobs is the table that holds the jobs and employees for that job but the date and time of each job is saved in the Job table Userjobs是保存工作和该工作的雇员的表,但是每个工作的日期和时间都保存在Job表中
I would like to return something like 我想返回类似
user(employee) - date and time job1 - date and time job1
EDIT: added more schema detail 编辑:添加了更多架构详细信息
CREATE TABLE user_jobs (
id integer NOT NULL,
job_id integer,
job_date date,
notes text,
job_rating integer,
notes text,
created_at timestamp without time zone,
updated_at timestamp without time zone,
user_id integer,
);
CREATE TABLE jobs (
id integer NOT NULL,
date date,
start_time time without time zone,
end_time time without time zone,
notes text,
);
CREATE TABLE users (
id integer NOT NULL,
email character varying(255) DEFAULT ''::character varying NOT NULL,
name character varying(255),
address character varying(255),
phone character varying(255),
picture character varying(255),
status character varying(255) DEFAULT 'active'::character varying,
);
Thanks in advance 提前致谢
Postgres 9.2+ Postgres 9.2以上
It's kind of a gnarly query: 这有点陈旧:
WITH alljobs AS(
SELECT * FROM jobs j INNER JOIN user_jobs uj ON uj.job_id = j.id
)
SELECT DISTINCT q1.user_id
FROM alljobs q1
JOIN alljobs q2 on
q1.user_id = q2.user_id
AND tsrange(q1.date + q1.start_time, q1.date + q1.end_time) && tsrange(q2.date + q2.start_time, q2.date + q2.end_time)
Explanation: 说明:
WITH alljobs
effectively assigns the variable name alljobs
to the given query. WITH alljobs
有效地将变量名称alljobs
分配给给定查询。 That query is just a joined list of all job assignments with start and end times. SELECT DISTINCT q1.user_id
returns only IDs of users that are double booked. SELECT DISTINCT q1.user_id
仅返回被重复预订的用户的ID。 This is technically what you asked for, though you'll likely want to expand this select to get back more useful information. SELECT *
while debugging. SELECT *
。 FROM alljobs q1 JOIN alljobs q2
This joins the assignments against itself, which is necessary to compare each assignment to every other assignment. FROM alljobs q1 JOIN alljobs q2
这将作业与自身进行联接,这是将每个作业与其他作业进行比较所必需的。 q1.user_id = q2.user_id
We only care about collisions for a single user. q1.user_id = q2.user_id
我们只关心单个用户的冲突。 You could alter this if you wanted to answer related questions such as "who is working together?" tsrange
a postgres built-in range function that creates a range from two timestamps. tsrange
一个postgres内置范围函数 ,该函数从两个时间戳创建一个范围。 There a similar functions for dates and other types of timestamps. &&
a postgres range operator for intersection. &&
一个postgres范围运算符,用于交集。 Postgres <9.2 Postgres <9.2
You can replace tsrange
and &&
with your own intersection logic, which I think looks something like: q1.start_time < q2.start_time && q1.finish_time > q2.start_time OR q2.start_time < q1.start_time && q2.finish_time > q1.start_time
. 您可以使用自己的交集逻辑替换
tsrange
和&&
,我认为它类似于: q1.start_time < q2.start_time && q1.finish_time > q2.start_time OR q2.start_time < q1.start_time && q2.finish_time > q1.start_time
。 (Add in date
as well.) (以及添加
date
。)
Or, since you specified start time is always the same, and that's really all you care about, you can do something simpler in that case: 或者,由于您指定的开始时间始终是相同的,而且实际上这就是您关心的全部,因此在这种情况下,您可以做一些简单的事情:
SELECT user_id, date + start_time, COUNT(*)
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id
GROUP BY user_id, date + start_time
HAVING COUNT(*) > 2
That will give you all user IDs that have duplicates. 这将为您提供所有重复的用户ID。 To get the respective jobs, you can wrap that in an outer query.
要获取相应的作业,可以将其包装在外部查询中。
SELECT user_jobs.user_id, user_jobs.job_id, jobs.date + jobs.start_time
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id INNER JOIN (
SELECT user_id, date + start_time, COUNT(*)
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id
GROUP BY user_id, date + start_time
HAVING COUNT(*) > 2
) dups ON dups.user_id = user_jobs.user_id
AND dups.date + dups.start_time = job.date + jobs.start_time
Schema Suggestion 模式建议
You're making life more difficult for yourself by having separate date
and time
columns. 通过使用单独的
date
和time
列,您将使自己的生活更加困难。 Why not just make start_time
and end_time
timestamps? 为什么不仅仅设置
start_time
和end_time
时间戳? Then you don't always need to be adding them together, and you can still get the date by casting it. 然后,您不必总是将它们加在一起,并且仍然可以通过强制转换获得日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.