简体   繁体   中英

Implementing DISTINCT ON in CubeJS

I have a Postgres table like this, with device ID, timestamp, and the status of the device at that time:

dev_id  | timestamp             | status
----------------------------------------
1       | 2020-08-06 23:00:00   | 1
2       | 2020-08-06 23:00:00   | 0
3       | 2020-08-06 23:00:00   | 1
2       | 2020-08-06 23:05:00   | 1
3       | 2020-08-06 23:05:00   | 0
1       | 2020-08-06 23:10:00   | 0

I want to see in their respective latest timestamp, how many of devices were functioning and how many not functioning. In Postgres, I can use DISTINCT ON and write the query like this:

SELECT status, COUNT(status) 
FROM
  (
    SELECT DISTINCT ON (dev_id) dev_id,
      timestamp,
      status 
    FROM
      sample_metrics_data 
    ORDER BY
      dev_id,
      timestamp DESC
  ) sub 
GROUP BY status; 

This will result in:

value   | count
---------------
0       | 2
1       | 1

(2 devices, #1 & #3, have a status of 0, while 1, #2, has a status of 1.) How can I create something like this in CubeJS? Is DISTINCT ON supported, and if not, what is the way around it?

Alternatively, the query can be written using inner join:

SELECT status,
       Count(status)
FROM   sample_metrics_data
       JOIN (SELECT dev_id         id,
                    Max(timestamp) ts
             FROM   sample_metrics_data
             GROUP  BY dev_id) max_ts
         ON timestamp = max_ts.ts
            AND dev_id = max_ts.id
GROUP BY status; 

I would need to do an inner join, but it seems only LEFT JOIN is available.

In your case, if you need to build a graph of how many devices were online, then a typical solution to your problem would be

  1. Build a cube in which there will be data on the change in the number of devices online.
  2. Create measures with rollingWindow

For example, I made a table as in your question

sample_metrics 表结构

And create this cube

cube(`SampleMetricsData`, {
  sql: "SELECT *, device_status - COALESCE(LAG(device_status) OVER (PARTITION BY id ORDER BY timemark ASC), 0) as rolling_status FROM ab_api_test.sample_metrics ORDER BY `sample_metrics`.`timemark` DESC",
   
  measures: { 
    rollingStatusTotal: {
      sql: `rolling_status`,
      type: `sum`, 
      rollingWindow: { 
        trailing: `unbounded`, 
      }, 
    },  
  },
  
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primaryKey: true
    },
    
    timemark: {
      sql: `timemark`,
      type: `time`
    }, 
  }
});

On this cube you can see online device chart with this query

{"measures":["SampleMetricsData.rollingStatusTotal"],"timeDimensions":[{"dimension":"SampleMetricsData.timemark","granularity":"hour","dateRange":"This month"}],"order":{},"dimensions":[],"filters":[]}

Possibly you should see this tutorial , It looks like something similar for your task. And one more related question is here

Note

You can also write a query like this to create a cube from your data. But this is not best practices

select * from (
     SELECT DISTINCT ON (dev_id) dev_id,
       timestamp,
       status
     FROM
       sample_metrics_data
     ORDER BY
       dev_id,
       timestamp DESC
) as sample_metrics

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM