In the following query, I am joining a samples table with 45324 items. The result gives me only 39426 and non of them containing empty SUM, SUM_YIELD or whatever... Could anyone explain why LEFT, RIGHT, and INNER JOINs give me the same outcome?
SELECT
`gs_prod`.`samples`.`id` AS `id`,
`gs_prod`.`samples`.`customer_name` AS `customer_name`,
`qcs_demux_stats_view`.`sample_name` AS `sample_name`,
FORMAT(
SUM(`qcs_demux_stats_view`.`clusters`),
0
) AS `SUM`,
FORMAT(
SUM(`qcs_demux_stats_view`.`yield`),
0
) AS `SUM_YIELD`,
ROUND(
(
SUM(
(
`qcs_demux_stats_view`.`perc_q30` * `qcs_demux_stats_view`.`clusters`
)
) / SUM(`qcs_demux_stats_view`.`clusters`)
),
2
) AS `perc_q30`
FROM
(
`gs_prod`.`qcs_demux_stats_view`
JOIN
`gs_prod`.`samples` ON(
(
`gs_prod`.`samples`.`id` = `qcs_demux_stats_view`.`sample_id`
)
)
)
WHERE
(
`qcs_demux_stats_view`.`parent_id` IN(
SELECT
`gs_prod`.`qcs`.`id`
FROM
`gs_prod`.`qcs`
WHERE
(
(`gs_prod`.`qcs`.`status` = 1) AND(
`gs_prod`.`qcs`.`deleted` = 0
)
)
) AND(
`qcs_demux_stats_view`.`status` = 1
)
)
GROUP BY
`gs_prod`.`samples`.`id`,
`qcs_demux_stats_view`.`sample_name`,
`gs_prod`.`samples`.`customer_name`
So I'm getting a result like this:
id customer_name sample_name SUM SUM_YIELD perc_q30
41453 103312-001-005-BC105 103312-001-005-BC105 7 0 88.27
41485 103312-001-005-BC137 103312-001-005-BC137 285 0 93.31
41517 103312-001-005-BC169 103312-001-005-BC169 270 0 91.46
But would also like to have lines like (where there is no data from qcs_demux_stats):
41517 103312-001-005-BC169 103312-001-005-BC169 0 0 NaN
The left 3 columns come from the samples table, the first one is the id that matches in the ON clause, and the right 3 columns are grouped data from qcs_demux_stats table.
The conditions in the WHERE clause are requiring columns from qcs_demux_stats_view
to be non-NULL, which will cause an OUTER join to be equivalent to an INNER join.
One way to think about what an OUTER join does, how it operates...
When no matching row is found, the query generates a dummy row consisting of all NULL values. This dummy "matching" row allows the row to be returned.
If we include a requirement that a column from the dummy row be non-NULL, then that row will be excluded. That essentially throws out all of the generated dummy rows. Rendering the result equivalent to an inner join.
To get outer join results, either:
change the conditions in the WHERE clause to allow NULL values to be returned
or
relocate those conditions to the ON clause of the OUTER join
I recommend putting the the driving table on the left side, and write the query as a LEFT
outer join. (Best to leave RIGHT
outer joins as academic exercises.)
SELECT s.`id` AS `id`
, s.`customer_name` AS `customer_name`
, v.`sample_name` AS `sample_name`
, FORMAT( SUM(v.`clusters`) ,0) AS `SUM`
, FORMAT( SUM(v.`yield`) ,0) AS `SUM_YIELD`
, ROUND( SUM( v.`perc_q30` * v.`clusters` )
/ SUM( v.`clusters` )
,2
) AS `perc_q30`
FROM `gs_prod`.`samples` s
LEFT
JOIN `gs_prod`.`qcs_demux_stats_view` v
ON v.`sample_id` = s.`id`
AND v.`parent_id` IN ( SELECT q.`id`
FROM `gs_prod`.`qcs` q
WHERE q.`status` = 1
AND q.`deleted` = 0
)
AND v.`status` = 1
GROUP
BY s.`id`
, v.`sample_name`
, s.`customer_name`
This effectively says, get me all rows from samples
, along with any matching rows from qcs_demux_stats_view
.
If mo matching row is found in qcs_demux_stats_view
, return the row from samples
(the driving table on the LEFT side.) For those rows, the values of the columns from qcs_demux_stats_view
will be NULL.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.