简体   繁体   中英

Get last item in SPLIT array using SQL in AWS Athena and PrestoDB

I'm trying to get the last item in an array after I split a string. I'd do this easily in Javascript using url.split('//')[url.split('//').length-1]

But how to do in SQL running on AWS Athena (which I believe is actually Proton)

// imagine a url is like 'http://www.google.com'

SELECT * SPLIT(url, '//')[2]
FROM table

Would result in www.google.com

But in some instance there is no result, so I need to use [1] and not [2] .

// imagine a url is like 'www.google.com'

SELECT * SPLIT(url, '//')[2]
FROM table

This would result in Error .

How do I get the last item in the array?

You can use a non-capturing group in regular expression

select regexp_extract('www.google.com', '(?:https?://)?(.*)',1)

and

select regexp_extract('http://www.google.com', '(?:https?://)?(.*)',1)

will both return www.google.com

Please note that the protocol can be either HTTP or HTTPS and the above regex also expects to have the s or not.

You have 2 options:

Use the element_at function, both will give the desired output:

SELECT ELEMENT_AT(SPLIT('http://www.google.com', '//'), -1)
SELECT ELEMENT_AT(SPLIT('www.google.com', '//'), -1)

The second option is using the slice function to get a subset of the array. Both will give the desired output:

SELECT SLICE(SPLIT('http://www.google.com', '//'), -1, 1)[1]
SELECT SLICE(SPLIT('www.google.com', '//'), -1, 1)[1]
  • -1 mentions the first index of the sub-array

  • 1 is the length of the sub-array

To read more about how slice function works: https://trino.io/docs/current/functions/array.html#slice

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM