简体   繁体   中英

Updating json field in Postgres

Querying Postgres 9.3 by json field is really great. However i couldn't find a formal way to update the json object, for which i use an internal function written in plpythonu based on previous post ( How do I modify fields inside the new PostgreSQL JSON datatype? ):

CREATE OR REPLACE FUNCTION json_update(data json, key text, value json)
RETURNS json AS
$BODY$
   from json import loads, dumps
   if key is None: return data
   js = loads(data)
   js[key] = value
   return dumps(js)
$BODY$
LANGUAGE plpythonu VOLATILE

It works really well when my json updates remains flatten and simple. Say "chat" is a json type filed in "GO_SESSION" table, and contains {"a":"1","b":"2"} , the following code will change 'b' value and turn "chat" to be {"a":"1","b":"5"}

update "GO_SESSION" set chat=json_update(chat,'b','5') where id=3

The problem is when i'm trying to assing 'b' another object rather than a simple value:

update "GO_SESSION" set chat=json_update(chat,'b','{"name":"steve"}') where id=3

The result in database is 'b' containing an escaped string rather than a real json object:

{"a": "1", "b": "{\\"name\\":\\"steve\\"}"}

I have tried different ways to unescape or dump my json in order to keep 'b' an object, but couldn't find a solution.

Thank you

No eval is required. Your issue is that you're not decoding the value as a json object.

CREATE OR REPLACE FUNCTION json_update(data json, key text, value json)
RETURNS json AS
$BODY$
   from json import loads, dumps
   if key is None: return data
   js = loads(data)
   # you must decode 'value' with loads too:
   js[key] = loads(value)
   return dumps(js)
$BODY$
LANGUAGE plpythonu VOLATILE;

postgres=# SELECT json_update('{"a":1}', 'a', '{"innerkey":"innervalue"}');
            json_update            
-----------------------------------
 {"a": {"innerkey": "innervalue"}}
(1 row)

Not only that, but using eval to decode json is dangerous and unreliable. It's unreliable because json isn't Python, it just happens to evaluate a little bit like it much of the time. It's unsafe because you never know what you might be eval'ing. In this case you are largely protected by PostgreSQL's json parser:

postgres=# SELECT json_update(
postgres(#    '{"a":1}', 
postgres(#    'a', 
postgres(#    '__import__(''shutil'').rmtree(''/glad_this_is_not_just_root'')'
postgres(# );
ERROR:  invalid input syntax for type json
LINE 4:          '__import__(''shutil'').rmtree(''/glad_this_is_not_...
                 ^
DETAIL:  Token "__import__" is invalid.
CONTEXT:  JSON data, line 1: __import__...

... but I won't be at all surprised if someone can slip an eval exploit past that. So the lesson here: don't use eval .

Solved

The problem with the above plpythonu function is that it relates to "value" as a string no matter if it's actually a complex json object. The key to solve it is to add eval() around value:

js[key] = eval(value)

That way the json string (named 'value' in this example) looses it's outer enclosing double quotes "{...}" and become an object.

for people who want plv8 (trusted language usable on services like Heroku). I often need to do migrations or updates to json blobs and running a query directly on the db is much faster than downloading all the data, transforming it and then posting an update.

CREATE EXTENSION plv8;
CREATE OR REPLACE FUNCTION json_replace_string(obj json, path text, value text, force boolean)
RETURNS json AS $$
if (value === null && !force) {
  return obj;
}
var nestedRe = /(\.|\[)/;
var scrub = /]/g;
path = path.replace(scrub, '');
var pathBits = path.split(nestedRe);
var len = pathBits.length;
var layer = obj;
for (var i = 0; i < len; i += 2) {
  if (layer === null || layer === undefined) return obj;
  var key = pathBits[i];
  if (key === '') continue;
  if (i === len - 1) {
    layer[key] = value;
  } else {
    if (force && typeof layer[key] === 'undefined') {
      layer[key] = pathBits[i+1] === '.' ? {} : [];
    }
    layer = layer[key];
  }
}
return obj;
$$ LANGUAGE plv8 IMMUTABLE;

You can use this like so

UPDATE my_table
SET blob=json_replace_string(blob, 'some.nested.path[5].to.object', 'new value', false)
WHERE some_condition;

the force parameter serves two functions - (1) lets you set a null value. If you are dynamically generating the value based on other columns that don't exist - eg blob->'non_existent_value' then null will be input into the function and you probably don't mean to set the value to null. The (2) purpose is to force the creation of the nested path if it doesn't already exist in the json object you are mutating. eg

json_replace(string('{"some_key": "some_val"}', 'other_key', 'new_val', true)

gives

{"some_key": "some_val", "other_key": "new_val"}

You can imagine similar functions to update numeric, delete keys etc. This basically enables mongo like functionality inside postgres during the early stages of new features for quick prototyping and as our schema stabilizes we break things out to independent columns and tables to get the best performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM