简体   繁体   中英

how to convert string to array in hive?

The value of the column is like this:

["a", "b", "c(d, e)"]

Here the value is string type. I wish to convert the string to array, and I tried with split (column_name, ',') . However because the element in the array contains the comma symbol (eg, "c(d, e)" ), it didn't work well. Is there any other way to convert the string to array?

In this case you can split by comma only between double-quotas.

REGEXP '(?<="), *(?=")' matching comma with optional space only between " and " , not including quotas.

(?<=") is a zero-width lookbehind, asserts that what immediately precedes the current position in the string is "

(?=") is a zero-width positive lookahead assertion, means it should be " after current position

After splitting in such way, array will contain elements with quotes: ' "a" ', you may want to remove these quotes, use regexp_replace:

Demo:

with your_data as (
  select '["a", "b", "c(d, e)"]' as str
) 

select split(str, '(?<="), *(?=")')       as splitted_array, 
       element, 
       regexp_replace(element,'^"|"$','') as element_unquotted
  from (
        select regexp_replace(str,'^\\[|\\]$','') as str --remove square brackets
         from your_data 
       ) d
       --explode array   
       lateral view explode(split(str, '(?<="), *(?=")')) e as element 

Result:

 splitted_array                       element      element_unquotted
 ["\"a\"","\"b\"","\"c(d, e)\""]       "a"          a
 ["\"a\"","\"b\"","\"c(d, e)\""]       "b"          b
 ["\"a\"","\"b\"","\"c(d, e)\""]       "c(d, e)"    c(d, e)

And if you need array of unquoted elements, you can collect array again using collect_list.

Another way is to replace ", " with some delimiter, remove all other quotas and square brackets, and split.

Demo:

with your_data as (
  select '["a", "b", "c(d, e)"]' as str
) 
select split(str,  '\\|\\|\\|') splitted_array 
  from (--replace '", ' with |||, remove all quotes, remove square brackets
         select regexp_replace(regexp_replace(str,'", *"','|||'),'^\\[|\\]$|"','') as str 
         from your_data ) d

Result:

splitted_array
["a","b","c(d, e)"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM