Use Str_to_map in bigquery

Question

I have a function str_to_map() in hive that I need to convert to Big Query. As we don't have map in Bigquery, I want to find another way to have a map format and then after that to extract the key-values by using the key name.

Example :
Select str_to_map('cars:0,kids:143,cats:1,lost:0,win:1,chances:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0,missed:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0',',',':')

If I call the key 'cars' I get the value '0'. If I call the key 'chances' I should get '0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0'

It's necessary for me to have a type like the 'map' type (key-value).

Thank you

Answer 1

Google provides some useful UDFs for BigQuery here in bigquery-utils .

Don't reinvent the wheel

So, I brought two udfs to answer this question.

1. get_value(k STRING, arr ANY TYPE)

Given a key and a list of key-value maps in the form [{'key': 'a', 'value': 'aaa'}], returns the SCALAR type value.

2. cw_map_parse(m string, pd string, kvd string)

String to map convert.

With these, you can write a query like below:

SELECT get_value('kids', cw_map_parse(str, ',', ':')) kids,
       get_value('chances', cw_map_parse(str, ',', ':')) chances,
  FROM UNNEST(['cars:0,kids:143,cats:1,lost:0,win:1,chances:0,missed:0']) str;
+------+---------+
| kids | chances |
+------+---------+
|  143 |       0 |
+------+---------+

But due to below requirements, cw_map_parse implementation needs to be customized a little bit.

If I call the key 'cars' I get the value '0'. If I call the key 'chances' I should get '0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0'

Below is a query with cutomized UDFs functions. str_to_map() is a customized version of cw_map_parse() .

CREATE TEMP FUNCTION str_to_map(m string, pd string, kvd string)
RETURNS ARRAY<STRUCT<key STRING, value STRING>> AS (
  ARRAY(
    SELECT AS STRUCT kv[SAFE_OFFSET(0)] AS key, kv[SAFE_OFFSET(1)] AS value
      FROM (
        SELECT SPLIT(REGEXP_REPLACE(kv, r'^(.*?)' || kvd, r'\1|'), '|') AS kv 
          FROM UNNEST(SPLIT(m, pd)) AS kv
      )
));

CREATE TEMP FUNCTION get_value(get_key STRING, arr ANY TYPE) AS (
  (SELECT value FROM UNNEST(arr) WHERE key = get_key)
);

SELECT get_value('cars', map) cars,
       get_value('kids', map) kids,
       get_value('chances', map) chances,
       get_value('missed', map) missed,
  FROM UNNEST(['cars:0,kids:143,cats:1,lost:0,win:1,chances:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0,missed:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0']) str,
       UNNEST([STRUCT(str_to_map(str, ',', ':') AS map)]);

+------+------+-------------------------------------+-------------------------------------+
| cars | kids |               chances               |               missed                |
+------+------+-------------------------------------+-------------------------------------+
|    0 |  143 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 |
+------+------+-------------------------------------+-------------------------------------+

Answer 2

Another super simple option for that particular case

select 
  json_value(json, '$.cars') cars,
  json_value(json, '$.kids') kids,
  json_value(json, '$.cats') cats,
  json_value(json, '$.lost') lost,
  json_value(json, '$.win') win,
  json_value(json, '$.chances') chances,
  json_value(json, '$.missed') missed
from your_table, 
unnest([format('{%s}', regexp_replace(str, r'([^:,]+):([\d:]*\d)', r'"\1":"\2"'))]) json

with output

Use Str_to_map in bigquery

Question

2 answers

solution1
1 2022-09-30 15:50:31

1. get_value(k STRING, arr ANY TYPE)

2. cw_map_parse(m string, pd string, kvd string)

solution2
0 2022-09-30 18:06:47

Use Str_to_map in bigquery

Question

2 answers

solution1 1 2022-09-30 15:50:31

1. get_value(k STRING, arr ANY TYPE)

2. cw_map_parse(m string, pd string, kvd string)

solution2 0 2022-09-30 18:06:47

solution1
1 2022-09-30 15:50:31

solution2
0 2022-09-30 18:06:47