简体   繁体   中英

Rails and queries - Custom query to get duplicated records

For the most part I try not to write custom SQL queries with my apps but I came acros a case and I was wondering if this would be one those cases where I would be better off doing it.

I'm using PostgreSQL with this specific app. I want to return only employees that have been double booked for a task here are my models.

I have the following models bellow.

User

has_many :user_jobs

    Fields
    - id
    - name
    - address
    - phone

Jobs

has_many :user_jobs
has_many :users, through: :user_jobs

    Fields
    - id
    - date
    - start_time
    - end_time

UserJobs

belongs_to :user
belongs_to :jobs

    Fields
    - id
    - job_id
    - user_id

Userjobs is the table that holds the jobs and employees for that job but the date and time of each job is saved in the Job table

I would like to return something like

user(employee) - date and time job1 - date and time job1

EDIT: added more schema detail

CREATE TABLE user_jobs (
    id integer NOT NULL,
    job_id integer,
    job_date date,
    notes text,
    job_rating integer,
    notes text,
    created_at timestamp without time zone,
    updated_at timestamp without time zone,
    user_id integer,
);


CREATE TABLE jobs (
    id integer NOT NULL,
    date date,
    start_time time without time zone,
    end_time time without time zone,
    notes text,
);


CREATE TABLE users (
    id integer NOT NULL,
    email character varying(255) DEFAULT ''::character varying NOT NULL,
    name character varying(255),
    address character varying(255),
    phone character varying(255),
    picture character varying(255),
    status character varying(255) DEFAULT 'active'::character varying,
);

Thanks in advance

Postgres 9.2+

It's kind of a gnarly query:

WITH alljobs AS(
  SELECT * FROM jobs j INNER JOIN user_jobs uj ON uj.job_id = j.id
) 
SELECT DISTINCT q1.user_id
FROM alljobs q1
JOIN alljobs q2 on
      q1.user_id = q2.user_id
  AND tsrange(q1.date + q1.start_time, q1.date + q1.end_time) && tsrange(q2.date + q2.start_time, q2.date + q2.end_time)

Explanation:

  • WITH alljobs effectively assigns the variable name alljobs to the given query. That query is just a joined list of all job assignments with start and end times.
  • SELECT DISTINCT q1.user_id returns only IDs of users that are double booked. This is technically what you asked for, though you'll likely want to expand this select to get back more useful information. I'd recommend using SELECT * while debugging.
  • FROM alljobs q1 JOIN alljobs q2 This joins the assignments against itself, which is necessary to compare each assignment to every other assignment.
  • q1.user_id = q2.user_id We only care about collisions for a single user. You could alter this if you wanted to answer related questions such as "who is working together?"
  • tsrange a postgres built-in range function that creates a range from two timestamps. There a similar functions for dates and other types of timestamps. _These range types were only introduced in 9.2.
  • && a postgres range operator for intersection.

Postgres <9.2

You can replace tsrange and && with your own intersection logic, which I think looks something like: q1.start_time < q2.start_time && q1.finish_time > q2.start_time OR q2.start_time < q1.start_time && q2.finish_time > q1.start_time . (Add in date as well.)

Or, since you specified start time is always the same, and that's really all you care about, you can do something simpler in that case:

SELECT user_id, date + start_time, COUNT(*)
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id
GROUP BY user_id, date + start_time
HAVING COUNT(*) > 2

That will give you all user IDs that have duplicates. To get the respective jobs, you can wrap that in an outer query.

SELECT user_jobs.user_id, user_jobs.job_id, jobs.date + jobs.start_time
FROM user_jobs INNER JOIN jobs ON job_id = jobs.id INNER JOIN (
  SELECT user_id, date + start_time, COUNT(*)
  FROM user_jobs INNER JOIN jobs ON job_id = jobs.id
  GROUP BY user_id, date + start_time
  HAVING COUNT(*) > 2
) dups ON dups.user_id = user_jobs.user_id
      AND dups.date + dups.start_time = job.date + jobs.start_time

Schema Suggestion

You're making life more difficult for yourself by having separate date and time columns. Why not just make start_time and end_time timestamps? Then you don't always need to be adding them together, and you can still get the date by casting it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM