简体   繁体   中英

mysql update command with multiple similar sub-selects

I wrote the following UPDATE command, but there's redundancy in the sub-selects. I'm not an expert in SQL and would appreciate help in making this query more efficient. Thanks ahead of time.

update trips
  set origin = 
  (select stop_name 
    from stops 
    inner join stop_times
    on stops.stop_id = stop_times.stop_id
    where stop_times.trip_id = trips.trip_id
    order by stop_sequence asc
    limit 1) 
  ,
  destination = 
  (select stop_name 
    from stops 
    inner join stop_times
    on stops.stop_id = stop_times.stop_id
    where stop_times.trip_id = trips.trip_id
    order by stop_sequence desc
    limit 1)
  ,
  starts = 
  (select arrival_time
    from stop_times
    where stop_times.trip_id = trips.trip_id
    order by stop_sequence asc
    limit 1) 
  ,
  ends = 
  (select arrival_time
    from stop_times
    where stop_times.trip_id = trips.trip_id
    order by stop_sequence desc
    limit 1)
;

Below are the relevant table definitions. There are approximately 72K trips, 8K stops, and 2 million stop_times. Maybe an average of 20? stops per trip (just guessing).

create table stop_times (
  trip_id varchar(255),
  arrival_time time,
  stop_id varchar(255),
  stop_sequence int unsigned,
) type=MyISAM;

alter table stop_times add index stop_id (stop_id(5));
alter table stop_times add index trip_id (trip_id(5));

create table stops (
  stop_id varchar(255),
  stop_name varchar(255),
  stop_lat float,
  stop_lon float,
  primary key (stop_id)
) type=MyISAM;

create table trips (
  route_id varchar(255),
  trip_id varchar(255), /* primary key is here */
  /* denormalized fields */
  origin varchar(255),
  destination varchar(255),
  starts time,
  ends time,
  primary key(trip_id)
) type=MyISAM;
alter table trips add index route_id (route_id(5));

First add a index on stop_times to include the trip_id and the stop_sequence columns

ALTER TABLE stop_times ADD PRIMARY KEY(trip_id, stop_sequence)

Then, try running this update:

update trips t JOIN (
    SELECT trip_id, MIN(stop_sequence) minS, MAX(stop_sequence) maxS 
    FROM stop_times
    GROUP BY trip_id
) tg ON t.trip_id = tg.trip_id
JOIN stop_times stFirst ON tg.trip_id = stFirst.trip_id AND stFirst.stop_sequence = tg.minS
JOIN stop_times stLast ON tg.trip_id = stLast.trip_id AND stLast.stop_sequence = tg.maxS
JOIN stops stFirstStop ON stFirst.stop_id = stFirstStop.stop_id
JOIN stops stLastStop ON stLast.stop_id = stLastStop.stop_id
SET t.origin = stFirstStop.stop_name,
    t.destination = stLastStop.stop_name,
    t.starts = stFirst.arrival_time,
    t.ends = stLast.arrival_time

Note: changing trip_id to a INT will give you better performance

Also, the trips table should store the origin_id and destination_id, which can later be joined to the stops table to find the name, instead of storing the name in all the rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM