简体   繁体   中英

DISTINCT on results of a knex.js INNER JOIN

I have two tables, metadata and view_events . Both metadata and view_events have config_id and config_type columns. I'm trying to select all view_events for a given user email, distinct by config_id and config_type , ordered by timestamp, desc , and limited to the 10 most recent. The following knex.js code isn't working but hopefully expresses what I'm trying to achieve:

return dbClient<AuthenticatedUserIndexRow>(METADATA_TABLE_NAME)
    .select([
      `${METADATA_TABLE_NAME}.${METADATA_COLUMNS.CONFIG_ID}`,
      `${METADATA_TABLE_NAME}.${METADATA_COLUMNS.CONFIG_TYPE}`,
      `${METADATA_TABLE_NAME}.${METADATA_COLUMNS.DESCRIPTION}`,
      `${VIEW_EVENTS_TABLE_NAME}.${VIEW_EVENTS_COLUMNS.TIMESTAMP}`,
    ])
    .innerJoin<AuthenticatedUserIndexRow>(VIEW_EVENTS_TABLE_NAME, function innerJoinOnViewEvents() {
      this.on(
        `${METADATA_TABLE_NAME}.${METADATA_COLUMNS.STORAGE_ID}`,
        '=',
        `${VIEW_EVENTS_TABLE_NAME}.${VIEW_EVENTS_COLUMNS.CONFIG_STORAGE_ID}`,
      )
        .andOn(
          `${VIEW_EVENTS_TABLE_NAME}.${VIEW_EVENTS_COLUMNS.USER_EMAIL}`,
          '=',
          rawSql('?', [authUserEmail]),
        )
        .andOn(`${METADATA_TABLE_NAME}.${METADATA_COLUMNS.DELETED}`, '=', rawSql('?', [false]));
    })
    .distinct([
      `${METADATA_TABLE_NAME}.${METADATA_COLUMNS.CONFIG_TYPE}`,
      `${METADATA_TABLE_NAME}.${METADATA_COLUMNS.CONFIG_ID}`,
    ])
    .limit(EVENT_LIMIT)
    .orderBy(VIEW_EVENTS_COLUMNS.TIMESTAMP, 'desc');

For example, given the following tables:

view_events
+-------------+-----------+--------------------------+----------------------+
| config_type | config_id |        timestamp         |        email         |
+-------------+-----------+--------------------------+----------------------+
| a           | foo       | 2020-01-23T03:08:14.618Z | john.smith@gmail.com |
| a           | foo       | 2020-01-23T03:08:14.500Z | jane.doe@gmail.com   |
| a           | foo       | 2020-01-23T03:08:13.618Z | john.smith@gmail.com |
| a           | bar       | 2020-01-23T03:08:12.618Z | john.smith@gmail.com |
| a           | foo       | 2020-01-23T03:08:11.618Z | john.smith@gmail.com |
| b           | foo       | 2020-01-23T03:08:10.618Z | john.smith@gmail.com |
| a           | baz       | 2020-01-23T03:08:09.618Z | john.smith@gmail.com |
| a           | foo       | 2020-01-23T03:08:08.618Z | john.smith@gmail.com |
+-------------+-----------+--------------------------+----------------------+

metadata
+-------------+-----------+---------------------------+
| config_type | config_id |        description        |
+-------------+-----------+---------------------------+
| a           | foo       | Type a config with id foo |
| a           | bar       | Type a config with id bar |
| b           | foo       | Type b config with id foo |
| a           | baz       | Type a config with id baz |
+-------------+-----------+---------------------------+

I am trying to obtain the following output (given an authUserEmail of john.smith@gmail.com ):

+-------------+-----------+---------------------------+
| config_type | config_id |        description        |
+-------------+-----------+---------------------------+
| a           | foo       | Type a config with id foo |
| a           | bar       | Type a config with id foo |
| b           | foo       | Type b config with id foo |
| a           | baz       | Type a config with id baz |
+-------------+-----------+---------------------------+

I'm not a SQL expert, but am generally aware that the use of SELECT and DISTINCT together here doesn't work. What's the correct approach?

Does the following roughly work for you? I did using with as so we could grab the 10 most recent configs ( max(timestamp)..group by config ) and then remove the timestamp column in the final projection. Note the final records may not appear in exact timestamp order as you did not want timestamp in your final output, but they will be the 10 most recent. I haven't added the DELETED column but imagine you will re-add that based on the code in your question.

knex.with('ordered_items', (qb) =>
          qb.table('metadata')
          .innerJoin('view_events', function() {
              this.on('metadata.config_id', '=', 'view_events.config_id')
                  .andOn('metadata.config_type', '=', 'view_events.config_type')
          })
          .where({'view_events.email': 'john.smith@gmail.com'})
          .select(['metadata.config_type', 'metadata.config_id',
                   'metadata.description'])
          .max('view_events.timestamp', {as: 'max_ts'})
          .groupBy(['metadata.config_id', 'metadata.config_type', 'metadata.description'])
          .orderBy('max_ts', 'desc')
          .limit(10))
    .table('ordered_items')
    .select(['config_type', 'config_id', 'description'])

My input and output:

sqlite> select * from metadata;
a|foo|Type a config with id foo
a|bar|Type a config with id bar
b|foo|Type b config with id foo
a|baz|Type a config with id baz
sqlite> select * from view_events;
a|foo|2020-01-23T03:08:14.618Z|john.smith@gmail.com
a|foo|2020-01-23T03:08:14.500Z|jane.doe@gmail.com
a|foo|2020-01-23T03:08:13.618Z|john.smith@gmail.com
a|bar|2020-01-23T03:08:12.618Z|john.smith@gmail.com
a|foo|2020-01-23T03:08:11.618Z|john.smith@gmail.com
b|foo|2020-01-23T03:08:10.618Z|john.smith@gmail.com
a|baz|2020-01-23T03:08:09.618Z|john.smith@gmail.com
a|foo|2020-01-23T03:08:08.618Z|john.smith@gmail.com

[ { config_type: 'a',
    config_id: 'foo',
    description: 'Type a config with id foo' },
  { config_type: 'a',
    config_id: 'bar',
    description: 'Type a config with id bar' },
  { config_type: 'b',
    config_id: 'foo',
    description: 'Type b config with id foo' },
  { config_type: 'a',
    config_id: 'baz',
    description: 'Type a config with id baz' } ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM