Filter only elements that matches a regex in Athena

Question

I have a column in a table which is of string type and the column is , separated.

Sample Input: 'ASEIAW,1245555,asda2dd,TPOIBV'
Expected output: ['ASEIAW,TPOIBV'] - An array with all matching elements which is an alphabet in upper case with exactly 6 charterers.

What I tried;

select REGEXP_EXTRACT('ASEIOW,ASDWQB,TPOIBV,2' , '(\b[A-Z]{6,6}\b)+');
Output: ASEIOW

select REGEXP_LIKE('ASEIOW,ASDWQB,TPOIBV,2' , '(\b[A-Z]{6,6}\b)+');
Output: [v]

select REGEXP_SPLIT('ASEIOW,ASDWQB,TPOIBV,2' , '(\b[A-Z]{6,6}\b)+');
Output: ['',',',',',',2']

Using NOT in front of regex

select REGEXP_SPLIT('ASEIOW,ASDWQB,TPOIBV,2' , '^(\b[A-Z]{6,6}\b)+');
Output: ['',',ASDWQB,TPOIBV,2']

select REGEXP_REPLACE('ASEIOW,ASDWQB,TPOIBV,2' , '^(\b[A-Z]{6,6}\b)+');
Output: ,ASDWQB,TPOIBV,2

Answer 1

You can use a REGEX_EXTRACT_ALL with a bit simplified regex:

REGEXP_EXTRACT_ALL('ASEIOW,ASDWQB,TPOIBV,2' , '\b([A-Z]{6})\b');

The REGEXP_EXTRACT_ALL function will extract all occurrences of the pattern matches and \b([AZ]{6})\b just matches six letters enclosed with word boundaries, no need to specify the identical min and max values in range quantifiers. Nor do you need to additionally quantify the pattern.

Answer 2

You can use REGEX_EXTRACT_ALL :

select regexp_extract_all('ASEIOW,ASDWQB,TPOIBV,2', '\b([A-Z]{6})\b');

Output:

_col0
[ASEIOW, ASDWQB, TPOIBV]

Answer 3

Good day. U can use nested regexp_replace like this.

SELECT sentences,REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(sentences,'.[[:lower:]0-9]'),'^','['''),'$',''']') as result
          FROM  
           (   
            SELECT 'ASEIAW,1245555,asda2dd,TPOIBV' AS sentences from dual
            )

Filter only elements that matches a regex in Athena

Question

3 answers

solution1
1 2022-07-18 11:08:20

solution2
0 2022-07-18 11:09:54

solution3
0 2022-07-19 12:50:24

Filter only elements that matches a regex in Athena

Question

3 answers

solution1 1 2022-07-18 11:08:20

solution2 0 2022-07-18 11:09:54

solution3 0 2022-07-19 12:50:24

solution1
1 2022-07-18 11:08:20

solution2
0 2022-07-18 11:09:54

solution3
0 2022-07-19 12:50:24