I want to extract values between tags and create new columns from that.
eg my column(which is varchar) got following value:
Working : History 0 : <Site Details.Number of Complaints>WAS<>IS<3>
I need to extract 3 columns from this: 1. Site Details.Number of Complaints 2. blank(null) 3. 3
as three values are enclosed between opening tag('>') and closing tag('<').
I already tried using regex_substr and strtok, but I am not able to extract second value as null.
Query so far:
select STRTOK(STRTOK('Working : History 0 : <Site Details.Number of Complaints>WAS<>IS<3>','<',1),'>',1) col_a,
STRTOK(STRTOK('Working : History 0 : <Site Details.Number of Complaints>WAS<>IS<3>','<',2) ,'>',1)col_b,
STRTOK(STRTOK('Working : History 0 : <Site Details.Number of Complaints>WAS<>IS<3>','<',3) ,'>',1)col_c,
STRTOK(STRTOK('Working : History 0 : <Site Details.Number of Complaints>WAS<>IS<3>','<',4) ,'>',1)col_d
Output:
col_a col_b col_c col_d
1 Working : History 0 : Site Details.Number of Complaints IS 3
FYI- Every column will have exact 3 opening and closing tags. I need teradata SQL for same.
As you noticed STRTOK
can't be used for this, it's for tokenizing strings with very basic rules.
You need a RegEx:
SELECT
RegExp_Substr(col, '<\K.*?(?=>)',1,1)
,RegExp_Substr(col, '<\K.*?(?=>)',1,2)
,RegExp_Substr(col, '<\K.*?(?=>)',1,3)
,'Working : History 0 : <Site Details.Number of Complaints>WAS<>IS<3>' AS col
<\K.*?(?=>)
<\K = check for '<', but don't add it to the result (similar to a positive lookbehind, which will not work in this case)
.*? = any characters, i.e. the expected result
(?=>) = check for '>' without adding it to the result, i.e. positive lookahead
See RegEx101 for details.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.