I have a table that has some rows with normal JSON and some with escaped values in the JSON field (backslashes)
id | obj | ||
---|---|---|---|
1 | {"is_from_shopping_bag":true,"products":[{"price":{"amount":"18.00","currency":"USD","offset":100,"amount_with_offset":"1800"},"product_id":"1234","quantity":1}],"source":"cart"} | ||
2 | {"is_from_shopping_bag":"","products":"[{\ "product_id\ ":\ "2345\ ",\ "price\ ":{\ "currency\ ":\ "USD\ ",\ "amount\ ":\ "140.00\ ",\ "offset\ ":100},\ "quantity\ ":1}]"} |
(Note: I needed to include a space after the backslashes in the above table so that they would show up in the github generated markdown table -- my actual table does not include those spaces between the backslash and the quote character)
I am doing a sql query in Hive to get the 'currency' field.
Currently I can run
SELECT
id,
JSON_EXTRACT(obj, '$.products[0].price.currency')
FROM my_table
Which will give me the correct output for the first row, but gives me a NULL in the second row
id | obj |
---|---|
1 | "USD" |
2 | NULL |
What is the best way to get currency field from the second row? Is there a way to clean up the field and remove the backslashes before trying to JSON_EXTRACT the relevant data? I could use REPLACE to swap the '\ ' for '', but is that the most efficient method?
Replace \"
with "
using regexp_replace
like this:
regexp_replace(obj,'\\\\"','"')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.