简体   繁体   English

Rails和查询-自定义查询以获取重复的记录

[英]Rails and queries - Custom query to get duplicated records

For the most part I try not to write custom SQL queries with my apps but I came acros a case and I was wondering if this would be one those cases where I would be better off doing it. 在大多数情况下,我尽量不使用我的应用程序编写自定义SQL查询,但我错了一个案子,我想知道这是否是我最好做的情况。

I'm using PostgreSQL with this specific app. 我在这个特定的应用程序中使用PostgreSQL。 I want to return only employees that have been double booked for a task here are my models. 我只想返回被双重预定任务的员工,这是我的模型。

I have the following models bellow. 我有以下型号的波纹管。

User 用户

has_many :user_jobs

    Fields
    - id
    - name
    - address
    - phone

Jobs 工作

has_many :user_jobs
has_many :users, through: :user_jobs

    Fields
    - id
    - date
    - start_time
    - end_time

UserJobs UserJobs

belongs_to :user
belongs_to :jobs

    Fields
    - id
    - job_id
    - user_id

Userjobs is the table that holds the jobs and employees for that job but the date and time of each job is saved in the Job table Userjobs是保存工作和该工作的雇员的表,但是每个工作的日期和时间都保存在Job表中

I would like to return something like 我想返回类似

user(employee) - date and time job1 - date and time job1

EDIT: added more schema detail 编辑:添加了更多架构详细信息

CREATE TABLE user_jobs (
    id integer NOT NULL,
    job_id integer,
    job_date date,
    notes text,
    job_rating integer,
    notes text,
    created_at timestamp without time zone,
    updated_at timestamp without time zone,
    user_id integer,
);


CREATE TABLE jobs (
    id integer NOT NULL,
    date date,
    start_time time without time zone,
    end_time time without time zone,
    notes text,
);


CREATE TABLE users (
    id integer NOT NULL,
    email character varying(255) DEFAULT ''::character varying NOT NULL,
    name character varying(255),
    address character varying(255),
    phone character varying(255),
    picture character varying(255),
    status character varying(255) DEFAULT 'active'::character varying,
);

Thanks in advance 提前致谢

Postgres 9.2+ Postgres 9.2以上

It's kind of a gnarly query: 这有点陈旧:

WITH alljobs AS(
  SELECT * FROM jobs j INNER JOIN user_jobs uj ON uj.job_id = j.id
) 
SELECT DISTINCT q1.user_id
FROM alljobs q1
JOIN alljobs q2 on
      q1.user_id = q2.user_id
  AND tsrange(q1.date + q1.start_time, q1.date + q1.end_time) && tsrange(q2.date + q2.start_time, q2.date + q2.end_time)

Explanation: 说明:

  • WITH alljobs effectively assigns the variable name alljobs to the given query. WITH alljobs有效地将变量名称alljobs分配给给定查询。 That query is just a joined list of all job assignments with start and end times. 该查询只是所有作业分配的合并列表,包括开始时间和结束时间。
  • SELECT DISTINCT q1.user_id returns only IDs of users that are double booked. SELECT DISTINCT q1.user_id仅返回被重复预订的用户的ID。 This is technically what you asked for, though you'll likely want to expand this select to get back more useful information. 从技术上讲,这是您所要的,尽管您可能希望扩展此选择以获取更多有用的信息。 I'd recommend using SELECT * while debugging. 我建议在调试时使用SELECT *
  • FROM alljobs q1 JOIN alljobs q2 This joins the assignments against itself, which is necessary to compare each assignment to every other assignment. FROM alljobs q1 JOIN alljobs q2这将作业与自身进行联接,这是将每个作业与其他作业进行比较所必需的。
  • q1.user_id = q2.user_id We only care about collisions for a single user. q1.user_id = q2.user_id我们只关心单个用户的冲突。 You could alter this if you wanted to answer related questions such as "who is working together?" 如果您想回答诸如“谁在一起工作?”之类的相关问题,则可以更改此设置。
  • tsrange a postgres built-in range function that creates a range from two timestamps. tsrange一个postgres内置范围函数 ,该函数从两个时间戳创建一个范围。 There a similar functions for dates and other types of timestamps. 日期和其他类型的时间戳具有类似的功能。 _These range types were only introduced in 9.2. _这些范围类型仅在9.2中引入。
  • && a postgres range operator for intersection. &&一个postgres范围运算符,用于交集。

Postgres <9.2 Postgres <9.2

You can replace tsrange and && with your own intersection logic, which I think looks something like: q1.start_time < q2.start_time && q1.finish_time > q2.start_time OR q2.start_time < q1.start_time && q2.finish_time > q1.start_time . 您可以使用自己的交集逻辑替换tsrange&& ,我认为它类似于: q1.start_time < q2.start_time && q1.finish_time > q2.start_time OR q2.start_time < q1.start_time && q2.finish_time > q1.start_time (Add in date as well.) (以及添加date 。)

Or, since you specified start time is always the same, and that's really all you care about, you can do something simpler in that case: 或者,由于您指定的开始时间始终是相同的,而且实际上这就是您关心的全部,因此在这种情况下,您可以做一些简单的事情:

SELECT user_id, date + start_time, COUNT(*)
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id
GROUP BY user_id, date + start_time
HAVING COUNT(*) > 2

That will give you all user IDs that have duplicates. 这将为您提供所有重复的用户ID。 To get the respective jobs, you can wrap that in an outer query. 要获取相应的作业,可以将其包装在外部查询中。

SELECT user_jobs.user_id, user_jobs.job_id, jobs.date + jobs.start_time
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id INNER JOIN (
  SELECT user_id, date + start_time, COUNT(*)
  FROM user_jobs INNER JOIN jobs ON job_id = jobs.id
  GROUP BY user_id, date + start_time
  HAVING COUNT(*) > 2
) dups ON dups.user_id = user_jobs.user_id
      AND dups.date + dups.start_time = job.date + jobs.start_time

Schema Suggestion 模式建议

You're making life more difficult for yourself by having separate date and time columns. 通过使用单独的datetime列,您将使自己的生活更加困难。 Why not just make start_time and end_time timestamps? 为什么不仅仅设置start_timeend_time时间戳? Then you don't always need to be adding them together, and you can still get the date by casting it. 然后,您不必总是将它们加在一起,并且仍然可以通过强制转换获得日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM