简体   繁体   中英

SQL: Extract from messy JSON nested field with backslashes

I have a table that has some rows with normal JSON and some with escaped values in the JSON field (backslashes)

id obj
1 {"is_from_shopping_bag":true,"products":[{"price":{"amount":"18.00","currency":"USD","offset":100,"amount_with_offset":"1800"},"product_id":"1234","quantity":1}],"source":"cart"}
2 {"is_from_shopping_bag":"","products":"[{\ "product_id\ ":\ "2345\ ",\ "price\ ":{\ "currency\ ":\ "USD\ ",\ "amount\ ":\ "140.00\ ",\ "offset\ ":100},\ "quantity\ ":1}]"}

(Note: I needed to include a space after the backslashes in the above table so that they would show up in the github generated markdown table -- my actual table does not include those spaces between the backslash and the quote character)

I am doing a sql query in Hive to get the 'currency' field.

Currently I can run

SELECT
    id,
    JSON_EXTRACT(obj, '$.products[0].price.currency')
FROM my_table

Which will give me the correct output for the first row, but gives me a NULL in the second row

id obj
1 "USD"
2 NULL

What is the best way to get currency field from the second row? Is there a way to clean up the field and remove the backslashes before trying to JSON_EXTRACT the relevant data? I could use REPLACE to swap the '\ ' for '', but is that the most efficient method?

Replace \" with " using regexp_replace like this:

regexp_replace(obj,'\\\\"','"') 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM