简体   繁体   English

通过类似的字符串查询redshift分组

[英]Query redshift grouping by similar strings

I have a table in Amazon redshift that lists api endpoints and their usage, and need to query the usage stats. 我在Amazon redshift中有一个表,该表列出了api端点及其使用情况,并且需要查询使用情况统计信息。 Unfortunately some of the endpoints include ids in the name, so I need a way of grouping by the endopint regardless of what id is in the url. 不幸的是,某些端点的名称中包含ID,因此无论URL中包含什么ID,我都需要一种按Endopint分组的方法。

Example data: 示例数据:

endpoint
'a/b/c'
'a/b/c/19'
'd/20'
'd/1'
'e/f'
'e/f'

I need a query that would take this data and output 我需要一个查询来获取此数据并输出

endpoint, count(*)
'a/b/c/*', 2
'd/*',     2
'e/f'      2

So far I have just tried to exclude ones with specific ids using something along the lines of 到目前为止,我只是尝试使用类似于

SELECT 
    endpoint, count(*) 
FROM 
    api_requests 
WHERE 
    endpoint NOT LIKE '%/[0-9]/%'
GROUP BY 
    endpoint 
ORDER BY 
    count(*) 
DESC;

But a) This doesn't work for some reason, and b) ideally I would group them by the id instead 但是a)由于某些原因这是行不通的,并且b)理想情况下,我将按ID将其分组

Any help would be greatly appreciated 任何帮助将不胜感激

You can use regexp_replace() : 您可以使用regexp_replace()

select regexp_replace(endpoint, '/[0-9]+$', '') as canonical,
       count(*)
from api_requests 
group by canonical;

This gets rid of the last group group if it is all numbers. 如果所有数字都是最后一个,则此组将被删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM