简体   繁体   中英

Using JOIN with DISTINCT and prioritize one table

I am trying to combine data from 2 tables.
Those 2 tables both contain data from the same sensor (lets say a sensor that measures CO2 with 1 entry per 10 minutes).

The first table contains validated data. Let's call it station1_validated . The 2nd table contains raw data. Let's call this one station1_nrt .

While the raw-data table contains live data, the validated table contains only data points that are at least 1 month old. (It needs some time to validate those data and to control it manually afterwards, this happens only once every month).

What I am trying to do now is to combine the data of those 2 tables to display live data on a website. However when validated data is available it should prioritize that data point over the raw data-point.

The relevant columns for this are:

  • timed [bigint(20)]: Contains the datetime as a unix timestamp in milliseconds from 1.1.1970
  • CO2 [double]: Contains the measured concentration of CO2 in ppm (parts per million)

I wrote this basic SQL:

SELECT 
    *
FROM
    (SELECT 
        timed, CO2, '2' tab
    FROM
        station1_nrt
    WHERE
        TIMED >= 1386932400000
            AND TIMED <= 1386939600000
            AND TIMED NOT IN (SELECT 
                timed
            FROM
                station1_nrt
            WHERE
                CO2 IS NOT NULL
                    AND TIMED >= 1386932400000
                    AND TIMED <= 1386939600000) UNION SELECT 
        timed, CO2, '1' tab
    FROM
        station1_validated
    WHERE
        CO2 IS NOT NULL
            AND TIMED >= 1386932400000
            AND TIMED <= 1386939600000) a
ORDER BY timed

This does not work correctly as it selects only those data points where both tables have an entry. However I want to do this with a JOIN now as it would be much faster. However I don't know how to a JOIN with a DISTINCT (or something similar) with prioritizing a table. Could someone help me out with this (or explain it?)

You haven't mentioned if there exist records in station1_validated which don't exist in station1_nrt so I use FULL JOIN . If all rows from station1_validated exist in station1_nrt then you can use LEFT JOIN instead.

Something like this

SELECT IFNULL(n.timed,v.timed) as timed,
       CASE WHEN v.timed IS NOT NULL THEN v.CO2 ELSE n.CO2 END as CO2,
       CASE WHEN v.timed IS NOT NULL THEN '1' ELSE '2' END as tab

FROM station1_nrt as n
FULL JOIN station1_validated as v ON n.timed=v.timed AND v.CO2 IS NOT NULL
    WHERE
        ( n.TIMED between 1386932400000 AND 1386939600000
          or 
          v.TIMED between 1386932400000 AND 1386939600000
        )
        AND 
        (n.CO2 IS NOT NULL OR v.CO2 IS NOT NULL)

You can join and then use IF s in the fields to choose the validated values if they exist. Something like:

SELECT
IFNULL(s1val.timed,s1.timed) AS timed,
IFNULL(s1val.C02,s1.C02) AS C02,
2 AS 2,
IFNULL(s1val.tab,s1.tab) AS tab,
FROM 
station1_nrt s1
LEFT JOIN station1_validated s1val ON (s1.TIMED = s1val.TIMED)
WHERE
-- Any necessary where clauses

MySQL has an IF that would probably work for you. You would have to select specific columns, though, but you could build the query programmatically.

SELECT
    IF(DATE_SUB(NOW(), INTERVAL 1 MONTH) < FROM_UNIXTIME(nrt.TIMED),
        val.value,
        nrt.value
    ) AS value
    -- Similar for other values
FROM
    station1_nrt AS nrt
    JOIN station1_validated AS val USING(id)
ORDER BY TIMED

Note that the USING(id) is a placeholder. Presumably there is some indexed column you can join the two tables on.

@Jim, @valex, @ExplosionPills I managed to write a SQL select that emulates a FULL OUTER JOIN (as there is no FULL JOIN in MySQL) and returns the value of the validated data if it exists. If no validated data is available it will return the raw value

So this is the SQL I am using now:

SET @StartTime  = 1356998400000;
SET @EndTime    = 1386546000000;

SELECT
    timed,
    IFNULL (mergedData.validatedValue, mergedData.rawValue) as value
FROM
((SELECT 
    from_unixtime(timed / 1000) as timed,
    rawData.NOX as rawValue,
    validatedData.NOX as validatedValue
FROM
    nabelnrt_bas as rawData
    LEFT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE 
    (rawData.timed > @StartTime
    AND rawData.timed < @EndTime)
    OR (validatedData.timed > @StartTime
    AND validatedData.timed < @EndTime)

) UNION (
SELECT 
    from_unixtime(timed / 1000) as timed,
    rawData.NOX as rawValue,
    validatedData.NOX as validatedValue
FROM
    nabelnrt_bas as rawData
    RIGHT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE 
    (rawData.timed > @StartTime
    AND rawData.timed < @EndTime)
    OR (validatedData.timed > @StartTime
    AND validatedData.timed < @EndTime)
)
ORDER BY timed DESC) as mergedData

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM