简体   繁体   中英

get results from multiple tables using union or left join

I have booking table with booking_id , booking_type columns. this table links with booking_taxi and booking_bus tables with foreign key booking_id .

booking :- booking_id | booking_type booking_id | booking_type

booking_taxi :- booking_taxi_id | booking_id | booking_date booking_taxi_id | booking_id | booking_date

booking_bus :- booking_vus_id | booking_id | booking_date booking_vus_id | booking_id | booking_date

I came up with two queries to get all the bookings with the respective booking date.

query 1:

select  bk.booking_id,
        bk.booking_type,
        case
          when booking_type = 3 then bbus.booking_date
          when booking_type = 2 then btaxi.pickup_date
        end as booking_date
    from booking bk
left join booking_taxi btaxi on btaxi.booking_id = bk.booking_id and bk.booking_type = 2
left join booking_bus bbus on bbus.booking_id = bk.booking_id and bk.booking_type = 3;

query 2:

select  bk.booking_id,
        bk.booking_type,
        btaxi.booking_date
from booking bk
inner join booking_taxi btaxi on btaxi.booking_id = bk.booking_id and bk.booking_type = 2
union all
select  bk.booking_id,
        bk.booking_type,
        bbus.booking_date
from booking bk
inner join booking_bus bbus on bbus.booking_id = bk.booking_id and bk.booking_type = 3;

which one will have the better performance?.

First, if you want to know about relative performance, then you should run the queries to see which perform better on your data on your system. You can garner information from explain .

Second, the queries are not identical. They may happen to produce the same result set on your data. But they are not guaranteed to. In particular, the second removes duplicate values (because of the union ) and the first does not.

Without any other information, I would expect the first to have better performance, specifically because the second incurs overhead for removing duplicate values. However, that would need to be tested.

Also, the first will return booking values that are not 1 or 2 . (I am assuming the 2/3 in the FROM clause is a typo.)

Personally, I prefer the first, although I am inclined to write it as:

select bk.booking_id, bk.booking_type,
       coalesce(btrain.booking_date, btaxi.pickup_date) as booking_date
from booking bk left join
     booking_taxi btaxi
     on btaxi.booking_id = bk.booking_id and
        bk.booking_type = 1 left join
        booking_bus bbus
     on bbus.booking_id = bk.booking_id and
        bk.booking_type = 2 and
        btaxi.booking_id is null
where btaxi.booking_id is not null or
      bbus.booking_id is not null;

There are three differences:

  • coalesce() instead of case . This is just shorter and easier to read.
  • The condition btaxi.booking_id is null so the second join filters out rows where the first matches (this is actually redundant because the filter on booking_type does the same thing).
  • The where condition to only return matches.

Nested join is a join that compares every record in one table against every record in the other. If there are M in one table and N in second Table the complexity becomes MxN.

Based on that theory your second query using union will be more efficient

The first thing that comes to mind is: Is the datamodel appropriate? Are bus and taxi bookings so very different from each other? Does one booking really consist of multiple bookings of one vehicle type on different dates?

This

  • booking : booking_id | booking_date | vehicle_id | trip_date
  • vehicle : vehicle_id | vehicle_type | id_company | ...

for instance may or may not be more appropriate. Querying the data will become much easier, if you find a more appropriate datamodel.

As to your current datamodel and query:

  • The query indicates that your datamodel allows inconsistencies. You can have a booking of type 2 but any of bbus, btaxi, btrain rows associated with it. You should find a way to change your datamodel so this cannot occur.
  • The queries return different results as long as there can be other booking types or you don't add a where clause to the first query limiting booking rows to the required types.
  • Both queries are fine. I think the second reads a little better. (It should be UNION ALL of course not UNION [DISTINCT] .)

As is, here is how I'd write the UNION ALL query:

select booking_id, 2 as booking_type, booking_date from btaxi
union all
select booking_id, 3 as booking_type, booking_date from btrain
order by booking_id, booking_date;

you can see queryplan in sqlserver and compaire performance of your queries .

however when you use join, if your tables are ordered with index sql compare them with nested join so when you create your tables with primarykey and forigenkey join has good performance ofcours you can make better performance with some index but in union sqlengine first get first query's result and sort it after that get second query result and sort that and compaire results and remove duplicate data so absolutly join is better than union .

Seems like query with Union All is faster than the Query with Left joins (at least for this scenario).

Left join query runs full scan three times (with nested loops)

解释左加入

But using Union all there are only two table scans

解释联盟全部

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM