简体   繁体   English

获取 MySQL 表中一组行中最后一个不同行的第一次出现

[英]get first occurrence of last different row in a group of rows in MySQL table

environment: Server version: 10.7.3-MariaDB-log环境:服务器版本:10.7.3-MariaDB-log

tables:表:

user location history:用户位置历史:

CREATE TABLE `location_history` (
  `id` int(10) UNSIGNED NOT NULL,
  `userId` int(10) UNSIGNED DEFAULT NULL,
  `latitude` double(10,8) DEFAULT NULL,
  `longitude` double(11,8) DEFAULT NULL,
  `createdAt` timestamp NOT NULL DEFAULT current_timestamp()
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

polygons(areas) points:多边形(区域)点:

CREATE TABLE `location_area_points` (
  `id` int(10) UNSIGNED NOT NULL,
  `location_area_id` int(10) UNSIGNED DEFAULT NULL,
  `area_group_id` int(10) UNSIGNED DEFAULT NULL,
  `latitude` double(10,8) DEFAULT NULL,
  `longitude` double(11,8) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

what I am trying to achieve: find out how long an user has been inside an area,我想要达到的目标:找出用户在某个区域内的时间,

for example find out when exactly did the userId 1 last enter into area_group_id 24,例如找出 userId 1 最后一次进入 area_group_id 24 的确切时间,

now, what we managed to do so far: find in which area, each point is using the following query:现在,到目前为止,我们设法做到了:查找在哪个区域,每个点都使用以下查询:

SELECT
    location_history.id,
    location_history.userId,
    location_history.createdAt,
    s.location_area_id
FROM
    location_history
JOIN(
    SELECT
        location_area_points.location_area_id,
        ST_PolygonFromText(
            CONCAT(
                "POLYGON((",
                GROUP_CONCAT(
                    CONCAT(
                        location_area_points.latitude,
                        ' ',
                        location_area_points.longitude
                    ) SEPARATOR ', '
                ),
                "))"
            )
        ) AS polygon
    FROM
        location_area_points
    GROUP BY
        location_area_points.location_area_id
) s
ON
    ST_CONTAINS(
        s.polygon,
        POINT(
            location_history.latitude,
            location_history.longitude
        )
    )
ORDER BY
    createdAt
DESC
    

we get the following as an example with users 1 and 6, :我们以用户 1 和 6 为例,得到以下内容:

id  userId  createdAt       location_area_id    
11765   1   2022-07-18 17:03:23 24  
11764   1   2022-07-18 17:03:07 24  
11763   1   2022-07-18 17:02:25 24  
11762   1   2022-07-18 17:02:16 24  
11761   1   2022-07-18 17:01:24 24  
11760   1   2022-07-18 17:00:32 24  
11759   1   2022-07-18 16:59:41 24  
11758   1   2022-07-18 16:59:40 24  <----- include in the results
11757   1   2022-07-18 16:58:49 2   
11756   1   2022-07-18 16:58:04 2   
11755   1   2022-07-18 16:57:06 2   
11754   1   2022-07-18 16:56:23 24  
11752   1   2022-07-18 16:56:14 24  
11753   1   2022-07-18 16:56:14 24  
11751   1   2022-07-18 16:54:31 24  
11750   1   2022-07-18 16:54:30 24  
11749   6   2022-07-18 16:53:39 5   
11748   6   2022-07-18 16:52:47 5   
11747   6   2022-07-18 16:51:56 5   <----- include in the results
11746   6   2022-07-18 16:51:55 24  
11744   6   2022-07-18 16:51:04 24  
11745   1   2022-07-18 16:51:04 24  
11743   1   2022-07-18 16:50:13 24  
11740   1   2022-07-18 16:49:20 24  
11738   1   2022-07-18 16:48:29 24  

now I would like to run additional query on the result above, to find out the first occurrence of the last group please see the code above "include in the results"现在我想对上面的结果运行额外的查询,找出最后一组的第一次出现,请参见上面的代码“包含在结果中”

so the final result should be :所以最终结果应该是:

 id userId  createdAt       location_area_id    
11758   1   2022-07-18 16:59:40 24
11747   6   2022-07-18 16:51:56 5

I apologize if my question is not structured well as I am not sure how to ask such a complicated question and I am open to advice/modification to the question at hand.如果我的问题结构不合理,我深表歉意,因为我不确定如何提出如此复杂的问题,并且我愿意对手头的问题提出建议/修改。

@danblack commented above that this is a window function problem, but there's a catch: MySQL 8 does not support nested window functions. @danblack 评论说这是一个窗口函数问题,但有一个问题:MySQL 8 不支持嵌套窗口函数。

What window functions do is take the result of your query, partition it by rules you state, then allow you to add extra columns to the result with information based on that grouping.窗口函数的作用是获取查询结果,按您声明的规则对其进行分区,然后允许您使用基于该分组的信息向结果添加额外的列。

You provided the table definitions - yay!您提供了表格定义 - 耶! You did not provide insert statements for the test data - aww.您没有为测试数据提供插入语句 - aww。 To test my answer I will be creating a table that matches your result set above, then selecting all to perform the window functions.为了测试我的答案,我将创建一个与您上面的结果集匹配的表,然后选择所有以执行窗口函数。

Window functions allow you to add extra columns in the result, that include information about neighboring rows.窗口函数允许您在结果中添加额外的列,其中包括有关相邻行的信息。

SELECT
    location_history.id,
    location_history.userId,
    location_history.createdAt,
    s.location_area_id,
    LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
... (the rest of your query)

This will produce a result set just like the one above, but with a column for where that user was in the previous record.这将产生一个与上面类似的结果集,但有一列表示该用户在上一条记录中的位置。

Window functions apply to the result of a query after the query is essentially complete, so there's not a way to narrow the results except to use them as a subquery.窗口函数在查询基本完成应用于查询结果,因此除了将它们用作子查询之外,没有办法缩小结果范围。 To get only the rows where the location changed:要仅获取位置更改的行:

SELECT * from (
    // insert big query here
) as `transitions`
WHERE where location_area_id != previous_area OR previous_area IS NULL

You may not care about the case where previous_area is null, it is just the first record for each user.您可能不关心previous_area 为空的情况,它只是每个用户的第一条记录。 But if the user hasn't changed locations since, then that record may be relevant to you.但是,如果用户此后没有更改过位置,那么该记录可能与您相关。

Now we have a list of every time the location changed.现在我们有一个每次位置更改的列表。

+-------+--------+---------------------+------------------+---------------+
| id    | userId | createdAt           | location_area_id | previous_area |
+-------+--------+---------------------+------------------+---------------+
| 11758 |      1 | 2022-07-18 16:59:40 |               24 |             2 |
| 11755 |      1 | 2022-07-18 16:57:06 |                2 |            24 |
| 11747 |      6 | 2022-07-18 16:51:56 |                5 |            24 |
| 11744 |      6 | 2022-07-18 16:51:04 |               24 |          NULL |
| 11738 |      1 | 2022-07-18 16:48:29 |               24 |          NULL |
+-------+--------+---------------------+------------------+---------------+

The remaining challenge is to find the LATEST row for each user.剩下的挑战是找到每个用户的 LATEST 行。 It seems like a good time for another window function, but that's not supported in MySQL 8. We can modify the above to get that result:这似乎是另一个窗口函数的好时机,但 MySQL 8 不支持。我们可以修改上面的内容以获得该结果:

SELECT MAX(id) as last_transition from (
    // insert big query here
) as t
WHERE where location_area_id != previous_area OR previous_area IS NULL
GROUP BY userId

which yields产生

+-----------------+
| last_transition |
+-----------------+
|           11758 |
|           11747 |
+-----------------+

A quick check shows that this agrees with the records you indicate in the question.快速检查表明这与您在问题中指出的记录一致。

So now we can take the big query, and join it with this result (that also uses the big query), and have the complete answer:所以现在我们可以获取大查询,并将它与这个结果(也使用大查询)连接起来,并得到完整的答案:

WITH big_query AS (
SELECT
    location_history.id,
    location_history.userId,
    location_history.createdAt,
    s.location_area_id
    LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
    location_history
JOIN(
    SELECT
        location_area_points.location_area_id,
        ST_PolygonFromText(
            CONCAT(
                "POLYGON((",
                GROUP_CONCAT(
                    CONCAT(
                        location_area_points.latitude,
                        ' ',
                        location_area_points.longitude
                    ) SEPARATOR ', '
                ),
                "))"
            )
        ) AS polygon
    FROM
        location_area_points
    GROUP BY
        location_area_points.location_area_id
) s
ON
    ST_CONTAINS(
        s.polygon,
        POINT(
            location_history.latitude,
            location_history.longitude
        )
    )
ORDER BY
    createdAt
DESC
)

SELECT * from big_query
  JOIN (
    SELECT MAX(id) as id FROM big_query
    WHERE location_area_id != previous_area OR previous_area IS NULL 
    GROUP BY userId
  ) as last_transitions using(id)

with the final answer最后的答案

+-------+--------+---------------------+------------------+---------------+
| id    | userId | createdAt           | location_area_id | previous_area |
+-------+--------+---------------------+------------------+---------------+
| 11758 |      1 | 2022-07-18 16:59:40 |               24 |             2 |
| 11747 |      6 | 2022-07-18 16:51:56 |                5 |            24 |
+-------+--------+---------------------+------------------+---------------+

The WITH statement lets you take a SELECT result and treat it like a table in its own right for the duration of the query. WITH语句允许您获取一个SELECT结果,并在查询期间将其视为一个单独的表。 You might think of a better name than "big_query" :).您可能会想到比“big_query”更好的名称:)。

Thanks to @Jerry https://stackoverflow.com/a/73040422/2294803 with his amazing answer.感谢@Jerry https://stackoverflow.com/a/73040422/2294803的惊人回答。 I used his example and the final query I came up with which works on mariadb 10.7.3我使用了他的示例和我想出的最终查询,该查询适用于mariadb 10.7.3

for completness:为了完整性:

SELECT
    *,
    MAX(id) AS last_transition
FROM
    (
    SELECT
        *
    FROM
        (
        SELECT
            location_history.id,
            location_history.userId,
            location_history.createdAt,
            s.location_area_id,
            LAG(location_area_id) OVER(
            PARTITION BY userId
        ORDER BY
            createdAt
        ) AS previous_area
    FROM
        location_history
    JOIN(
        SELECT
            location_area_points.location_area_id,
            ST_PolygonFromText(
                CONCAT(
                    "POLYGON((",
                    GROUP_CONCAT(
                        CONCAT(
                            location_area_points.latitude,
                            ' ',
                            location_area_points.longitude
                        ) SEPARATOR ', '
                    ),
                    "))"
                )
            ) AS POLYGON
        FROM
            location_area_points
        GROUP BY
            location_area_points.location_area_id
    ) s
ON
    ST_CONTAINS(
        s.polygon,
        POINT(
            location_history.latitude,
            location_history.longitude
        )
    )
ORDER BY
    createdAt
DESC
    ) AS `transitions`
WHERE
    location_area_id != previous_area OR previous_area IS NULL
) AS t
WHERE
    location_area_id != previous_area OR previous_area IS NULL
GROUP BY
    userId

gives me the result:给我结果:

id  userId  createdAt   location_area_id    previous_area   last_transition 
11758   1   2022-07-18 16:59:40 24  2   11758   
5121    3   2022-07-18 00:05:30 2   10  5121    
2364    4   2022-05-03 22:59:48 11  2   2364    
12978   5   2022-07-19 17:12:41 2   10  12978   
1747    12  2022-05-03 12:23:35 2   NULL    1747    
1703    14  2022-05-03 02:49:57 24  NULL    1703    
1734    17  2022-05-03 11:08:43 24  NULL    1734    
2623    24  2022-05-29 11:55:59 2   NULL    2623    
2610    25  2022-05-17 07:39:02 2   NULL    2610    
2620    29  2022-05-29 11:48:04 13  2   2620    
2629    35  2022-05-29 13:40:45 2   NULL    2629    
3215    36  2022-07-17 22:48:32 24  25  3215    
11777   41  2022-07-19 09:47:30 24  NULL    11777   
3252    42  2022-07-17 22:50:09 24  NULL    3252    

which seem exactly what I have been looking for!这似乎正是我一直在寻找的!

Thanks to @Jerry感谢@Jerry

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM