简体   繁体   English

SQL查询以查找分组值的所有组合

[英]SQL query to find all combinations of grouped values

I am looking for a SQL query or a series of SQL queries. 我正在寻找一个SQL查询或一系列SQL查询。

Schema 架构

  • I have a logging table with three columns: id , event_type , and timestamp 我有一个包含三列的日志记录表: idevent_typetimestamp
  • The IDs are arbitrary text, generated randomly at runtime and unknown to me ID是任意文本,在运行时随机生成,对我来说是未知的
  • The event types are numbers from a finite collection of known event types 事件类型是已知事件类型的有限集合中的数字
  • The timestamps are your typical int64 epoch timestamp 时间戳是您典型的int64纪元时间戳
  • A single ID value may have 1 or more rows, each with some value for event_type . 一个ID值可能包含1或更多行,每行都有一些event_type值。 representing a flow of events associated with the same ID 代表与相同ID相关的事件流
  • For each ID, its collection of rows can be sorted by increasing timestamp 对于每个ID,可以通过增加时间戳来对其行集合进行排序
  • Most times, there will be only one occurrence of an ID + event type combination, but rarely, there could be two; 在大多数情况下,ID +事件类型组合只会出现一次,但很少会出现两次。 not sure this matters 不确定这很重要

Goal 目标

What I want to do is to query the number of distinct combinations of event types (sorted by timestamp). 我想做的是查询事件类型的不同组合数(按时间戳排序)。 For example, provided this table: 例如,提供此表:

id          event_type          timestamp
-----------------------------------------
foo         event_1             101
foo         event_2             102
bar         event_2             102
bar         event_1             101
foo         event_3             103
bar         event_3             103
blah        event_1             101
bleh        event_2             102
backwards   event_1             103
backwards   event_2             102
backwards   event_3             101

Then I should get the following result: 然后我应该得到以下结果:

combination               count
-------------------------------
[event_1,event_2,event_3]   2    // foo and bar
[event_3,event_2,event_1]   1    // backwards
[event_1]                   1    // blah
[event_2]                   1    // bleh

You can do 2 levels of grouping to your data. 您可以对数据进行2级分组。
For Mysql use group_concat() : 对于Mysql使用group_concat()

select t.combination, count(*) count
from (
  select
    group_concat(event_type order by timestamp) combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo . 参见演示
For Postgresql use array_agg() with array_to_string() : 对于Postgresql,请使用array_agg()array_to_string()

select t.combination, count(*) count
from (
  select
    array_to_string(array_agg(event_type order by timestamp), ',') combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo . 参见演示
For Oracle there is listagg() : 对于Oracle,listagg()

select t.combination, count(*) count
from (
  select
    listagg(event_type, ',') within group (order by timestamp) combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo . 参见演示
For SQL Server 2017+ there is string_agg() : 对于SQL Server 2017+,有string_agg()

select t.combination, count(*) count
from (
  select
    string_agg(event_type, ',') within group (order by timestamp) combination
  from tablename
  group by id
) t
group by t.combination
order by count desc

See the demo . 参见演示
Results: 结果:

| combination             | count |
| ----------------------- | ----- |
| event_1,event_2,event_3 | 2     |
| event_3,event_2,event_1 | 1     |
| event_1                 | 1     |
| event_2                 | 1     |
SELECT
    "combi"."combination",
    COUNT(*) AS "count"
FROM 
    (
        SELECT
            GROUP_CONCAT("event_type" SEPARATOR ',') AS "combination"
        FROM
            ?table?
        GROUP BY
            "id"
    ) AS "combi"
GROUP BY
  "combi"."combination"

Note: GROUP_CONCAT(... SEPARATOR ...) syntax is not SQL standard, it's DB specific (in this case MySQL, other dbs have other aggregate functions). 注意: GROUP_CONCAT(... SEPARATOR ...)语法不是SQL标准,而是特定于数据库的(在这种情况下,MySQL,其他数据库具有其他聚合函数)。 You might need to adjust for your DB of choice or specify in tags which DB you are actually using. 您可能需要根据选择的数据库进行调整,或者在标签中指定实际使用的数据库。

As for "sorted by timestamp", you need to define what this actually means. 至于“按时间戳排序”,则需要定义其实际含义。 What is "sorted by timestamp" for a group of groups? 一组组的“按时间戳排序”是什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM