[英]get first occurrence of last different row in a group of rows in MySQL table
environment: Server version: 10.7.3-MariaDB-log环境:服务器版本:10.7.3-MariaDB-log
CREATE TABLE `location_history` (
`id` int(10) UNSIGNED NOT NULL,
`userId` int(10) UNSIGNED DEFAULT NULL,
`latitude` double(10,8) DEFAULT NULL,
`longitude` double(11,8) DEFAULT NULL,
`createdAt` timestamp NOT NULL DEFAULT current_timestamp()
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
CREATE TABLE `location_area_points` (
`id` int(10) UNSIGNED NOT NULL,
`location_area_id` int(10) UNSIGNED DEFAULT NULL,
`area_group_id` int(10) UNSIGNED DEFAULT NULL,
`latitude` double(10,8) DEFAULT NULL,
`longitude` double(11,8) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
what I am trying to achieve: find out how long an user has been inside an area,我想要达到的目标:找出用户在某个区域内的时间,
for example find out when exactly did the userId 1 last enter into area_group_id 24,例如找出 userId 1 最后一次进入 area_group_id 24 的确切时间,
now, what we managed to do so far: find in which area, each point is using the following query:现在,到目前为止,我们设法做到了:查找在哪个区域,每个点都使用以下查询:
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id
FROM
location_history
JOIN(
SELECT
location_area_points.location_area_id,
ST_PolygonFromText(
CONCAT(
"POLYGON((",
GROUP_CONCAT(
CONCAT(
location_area_points.latitude,
' ',
location_area_points.longitude
) SEPARATOR ', '
),
"))"
)
) AS polygon
FROM
location_area_points
GROUP BY
location_area_points.location_area_id
) s
ON
ST_CONTAINS(
s.polygon,
POINT(
location_history.latitude,
location_history.longitude
)
)
ORDER BY
createdAt
DESC
we get the following as an example with users 1 and 6, :我们以用户 1 和 6 为例,得到以下内容:
id userId createdAt location_area_id
11765 1 2022-07-18 17:03:23 24
11764 1 2022-07-18 17:03:07 24
11763 1 2022-07-18 17:02:25 24
11762 1 2022-07-18 17:02:16 24
11761 1 2022-07-18 17:01:24 24
11760 1 2022-07-18 17:00:32 24
11759 1 2022-07-18 16:59:41 24
11758 1 2022-07-18 16:59:40 24 <----- include in the results
11757 1 2022-07-18 16:58:49 2
11756 1 2022-07-18 16:58:04 2
11755 1 2022-07-18 16:57:06 2
11754 1 2022-07-18 16:56:23 24
11752 1 2022-07-18 16:56:14 24
11753 1 2022-07-18 16:56:14 24
11751 1 2022-07-18 16:54:31 24
11750 1 2022-07-18 16:54:30 24
11749 6 2022-07-18 16:53:39 5
11748 6 2022-07-18 16:52:47 5
11747 6 2022-07-18 16:51:56 5 <----- include in the results
11746 6 2022-07-18 16:51:55 24
11744 6 2022-07-18 16:51:04 24
11745 1 2022-07-18 16:51:04 24
11743 1 2022-07-18 16:50:13 24
11740 1 2022-07-18 16:49:20 24
11738 1 2022-07-18 16:48:29 24
now I would like to run additional query on the result above, to find out the first occurrence of the last group please see the code above "include in the results"现在我想对上面的结果运行额外的查询,找出最后一组的第一次出现,请参见上面的代码“包含在结果中”
so the final result should be :所以最终结果应该是:
id userId createdAt location_area_id
11758 1 2022-07-18 16:59:40 24
11747 6 2022-07-18 16:51:56 5
I apologize if my question is not structured well as I am not sure how to ask such a complicated question and I am open to advice/modification to the question at hand.如果我的问题结构不合理,我深表歉意,因为我不确定如何提出如此复杂的问题,并且我愿意对手头的问题提出建议/修改。
@danblack commented above that this is a window function problem, but there's a catch: MySQL 8 does not support nested window functions. @danblack 评论说这是一个窗口函数问题,但有一个问题:MySQL 8 不支持嵌套窗口函数。
What window functions do is take the result of your query, partition it by rules you state, then allow you to add extra columns to the result with information based on that grouping.窗口函数的作用是获取查询结果,按您声明的规则对其进行分区,然后允许您使用基于该分组的信息向结果添加额外的列。
You provided the table definitions - yay!您提供了表格定义 - 耶! You did not provide insert statements for the test data - aww.
您没有为测试数据提供插入语句 - aww。 To test my answer I will be creating a table that matches your result set above, then selecting all to perform the window functions.
为了测试我的答案,我将创建一个与您上面的结果集匹配的表,然后选择所有以执行窗口函数。
Window functions allow you to add extra columns in the result, that include information about neighboring rows.窗口函数允许您在结果中添加额外的列,其中包括有关相邻行的信息。
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id,
LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
... (the rest of your query)
This will produce a result set just like the one above, but with a column for where that user was in the previous record.这将产生一个与上面类似的结果集,但有一列表示该用户在上一条记录中的位置。
Window functions apply to the result of a query after the query is essentially complete, so there's not a way to narrow the results except to use them as a subquery.窗口函数在查询基本完成后应用于查询结果,因此除了将它们用作子查询之外,没有办法缩小结果范围。 To get only the rows where the location changed:
要仅获取位置更改的行:
SELECT * from (
// insert big query here
) as `transitions`
WHERE where location_area_id != previous_area OR previous_area IS NULL
You may not care about the case where previous_area is null, it is just the first record for each user.您可能不关心previous_area 为空的情况,它只是每个用户的第一条记录。 But if the user hasn't changed locations since, then that record may be relevant to you.
但是,如果用户此后没有更改过位置,那么该记录可能与您相关。
Now we have a list of every time the location changed.现在我们有一个每次位置更改的列表。
+-------+--------+---------------------+------------------+---------------+
| id | userId | createdAt | location_area_id | previous_area |
+-------+--------+---------------------+------------------+---------------+
| 11758 | 1 | 2022-07-18 16:59:40 | 24 | 2 |
| 11755 | 1 | 2022-07-18 16:57:06 | 2 | 24 |
| 11747 | 6 | 2022-07-18 16:51:56 | 5 | 24 |
| 11744 | 6 | 2022-07-18 16:51:04 | 24 | NULL |
| 11738 | 1 | 2022-07-18 16:48:29 | 24 | NULL |
+-------+--------+---------------------+------------------+---------------+
The remaining challenge is to find the LATEST row for each user.剩下的挑战是找到每个用户的 LATEST 行。 It seems like a good time for another window function, but that's not supported in MySQL 8. We can modify the above to get that result:
这似乎是另一个窗口函数的好时机,但 MySQL 8 不支持。我们可以修改上面的内容以获得该结果:
SELECT MAX(id) as last_transition from (
// insert big query here
) as t
WHERE where location_area_id != previous_area OR previous_area IS NULL
GROUP BY userId
which yields产生
+-----------------+
| last_transition |
+-----------------+
| 11758 |
| 11747 |
+-----------------+
A quick check shows that this agrees with the records you indicate in the question.快速检查表明这与您在问题中指出的记录一致。
So now we can take the big query, and join it with this result (that also uses the big query), and have the complete answer:所以现在我们可以获取大查询,并将它与这个结果(也使用大查询)连接起来,并得到完整的答案:
WITH big_query AS (
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id
LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
location_history
JOIN(
SELECT
location_area_points.location_area_id,
ST_PolygonFromText(
CONCAT(
"POLYGON((",
GROUP_CONCAT(
CONCAT(
location_area_points.latitude,
' ',
location_area_points.longitude
) SEPARATOR ', '
),
"))"
)
) AS polygon
FROM
location_area_points
GROUP BY
location_area_points.location_area_id
) s
ON
ST_CONTAINS(
s.polygon,
POINT(
location_history.latitude,
location_history.longitude
)
)
ORDER BY
createdAt
DESC
)
SELECT * from big_query
JOIN (
SELECT MAX(id) as id FROM big_query
WHERE location_area_id != previous_area OR previous_area IS NULL
GROUP BY userId
) as last_transitions using(id)
with the final answer最后的答案
+-------+--------+---------------------+------------------+---------------+
| id | userId | createdAt | location_area_id | previous_area |
+-------+--------+---------------------+------------------+---------------+
| 11758 | 1 | 2022-07-18 16:59:40 | 24 | 2 |
| 11747 | 6 | 2022-07-18 16:51:56 | 5 | 24 |
+-------+--------+---------------------+------------------+---------------+
The WITH
statement lets you take a SELECT
result and treat it like a table in its own right for the duration of the query. WITH
语句允许您获取一个SELECT
结果,并在查询期间将其视为一个单独的表。 You might think of a better name than "big_query" :).您可能会想到比“big_query”更好的名称:)。
Thanks to @Jerry https://stackoverflow.com/a/73040422/2294803 with his amazing answer.感谢@Jerry https://stackoverflow.com/a/73040422/2294803的惊人回答。 I used his example and the final query I came up with which works on mariadb 10.7.3
我使用了他的示例和我想出的最终查询,该查询适用于mariadb 10.7.3
for completness:为了完整性:
SELECT
*,
MAX(id) AS last_transition
FROM
(
SELECT
*
FROM
(
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id,
LAG(location_area_id) OVER(
PARTITION BY userId
ORDER BY
createdAt
) AS previous_area
FROM
location_history
JOIN(
SELECT
location_area_points.location_area_id,
ST_PolygonFromText(
CONCAT(
"POLYGON((",
GROUP_CONCAT(
CONCAT(
location_area_points.latitude,
' ',
location_area_points.longitude
) SEPARATOR ', '
),
"))"
)
) AS POLYGON
FROM
location_area_points
GROUP BY
location_area_points.location_area_id
) s
ON
ST_CONTAINS(
s.polygon,
POINT(
location_history.latitude,
location_history.longitude
)
)
ORDER BY
createdAt
DESC
) AS `transitions`
WHERE
location_area_id != previous_area OR previous_area IS NULL
) AS t
WHERE
location_area_id != previous_area OR previous_area IS NULL
GROUP BY
userId
id userId createdAt location_area_id previous_area last_transition
11758 1 2022-07-18 16:59:40 24 2 11758
5121 3 2022-07-18 00:05:30 2 10 5121
2364 4 2022-05-03 22:59:48 11 2 2364
12978 5 2022-07-19 17:12:41 2 10 12978
1747 12 2022-05-03 12:23:35 2 NULL 1747
1703 14 2022-05-03 02:49:57 24 NULL 1703
1734 17 2022-05-03 11:08:43 24 NULL 1734
2623 24 2022-05-29 11:55:59 2 NULL 2623
2610 25 2022-05-17 07:39:02 2 NULL 2610
2620 29 2022-05-29 11:48:04 13 2 2620
2629 35 2022-05-29 13:40:45 2 NULL 2629
3215 36 2022-07-17 22:48:32 24 25 3215
11777 41 2022-07-19 09:47:30 24 NULL 11777
3252 42 2022-07-17 22:50:09 24 NULL 3252
which seem exactly what I have been looking for!这似乎正是我一直在寻找的!
Thanks to @Jerry感谢@Jerry
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.