Get last item in SPLIT array using SQL in AWS Athena and PrestoDB

Question

I'm trying to get the last item in an array after I split a string. I'd do this easily in Javascript using url.split('//')[url.split('//').length-1]

But how to do in SQL running on AWS Athena (which I believe is actually Proton)

// imagine a url is like 'http://www.google.com'

SELECT * SPLIT(url, '//')[2]
FROM table

Would result in www.google.com

But in some instance there is no result, so I need to use [1] and not [2] .

// imagine a url is like 'www.google.com'

SELECT * SPLIT(url, '//')[2]
FROM table

This would result in Error .

How do I get the last item in the array?

Answer 1

You can use a non-capturing group in regular expression

select regexp_extract('www.google.com', '(?:https?://)?(.*)',1)

and

select regexp_extract('http://www.google.com', '(?:https?://)?(.*)',1)

will both return www.google.com

Please note that the protocol can be either HTTP or HTTPS and the above regex also expects to have the s or not.

Answer 2

You have 2 options:

Use the element_at function, both will give the desired output:

SELECT ELEMENT_AT(SPLIT('http://www.google.com', '//'), -1)
SELECT ELEMENT_AT(SPLIT('www.google.com', '//'), -1)

The second option is using the slice function to get a subset of the array. Both will give the desired output:

SELECT SLICE(SPLIT('http://www.google.com', '//'), -1, 1)[1]
SELECT SLICE(SPLIT('www.google.com', '//'), -1, 1)[1]

-1 mentions the first index of the sub-array
1 is the length of the sub-array

To read more about how slice function works: https://trino.io/docs/current/functions/array.html#slice

Get last item in SPLIT array using SQL in AWS Athena and PrestoDB

Question

2 answers

solution1
0 2021-03-07 18:59:15

solution2
0 2023-01-02 13:06:54

Get last item in SPLIT array using SQL in AWS Athena and PrestoDB

Question

2 answers

solution1 0 2021-03-07 18:59:15

solution2 0 2023-01-02 13:06:54

solution1
0 2021-03-07 18:59:15

solution2
0 2023-01-02 13:06:54