[英]Query redshift grouping by similar strings
I have a table in Amazon redshift that lists api endpoints and their usage, and need to query the usage stats. 我在Amazon redshift中有一个表,该表列出了api端点及其使用情况,并且需要查询使用情况统计信息。 Unfortunately some of the endpoints include ids in the name, so I need a way of grouping by the endopint regardless of what id is in the url.
不幸的是,某些端点的名称中包含ID,因此无论URL中包含什么ID,我都需要一种按Endopint分组的方法。
Example data: 示例数据:
endpoint
'a/b/c'
'a/b/c/19'
'd/20'
'd/1'
'e/f'
'e/f'
I need a query that would take this data and output 我需要一个查询来获取此数据并输出
endpoint, count(*)
'a/b/c/*', 2
'd/*', 2
'e/f' 2
So far I have just tried to exclude ones with specific ids using something along the lines of 到目前为止,我只是尝试使用类似于
SELECT
endpoint, count(*)
FROM
api_requests
WHERE
endpoint NOT LIKE '%/[0-9]/%'
GROUP BY
endpoint
ORDER BY
count(*)
DESC;
But a) This doesn't work for some reason, and b) ideally I would group them by the id instead 但是a)由于某些原因这是行不通的,并且b)理想情况下,我将按ID将其分组
Any help would be greatly appreciated 任何帮助将不胜感激
You can use regexp_replace()
: 您可以使用
regexp_replace()
:
select regexp_replace(endpoint, '/[0-9]+$', '') as canonical,
count(*)
from api_requests
group by canonical;
This gets rid of the last group group if it is all numbers. 如果所有数字都是最后一个,则此组将被删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.