简体   繁体   中英

How to update a large (1 million+ rows) postgres column of jsonb type values

Trying to update a specific array inside of a jsonb type in a column called params, and having issues with how long it's taking. For example, there is a table with a row that contains an array owners

{
  "hidden": false,
  "org_id": "34692",
  "owners": [
    "tim@facebuk.com"
  ],
  "deleted": false
}

And another example

{
  "hidden": false,
  "org_id": "34692",
  "owners": [
    "tim@google.com"
  ],
  "deleted": false
}

And there's essentially a million of these rows (all with different email domains as owners . I have this query which I want to execute across all of these rows:

UPDATE table set params = CASE WHEN params->>'owners' NOT LIKE '%google.com%' THEN jsonb_set(params, '{owners}', concat('"', substr(md5(random()::text), 0, 25), '@googlefake.com"')::jsonb) ELSE params END

I've tested with a dataset of 100, and it executes perfectly time, but doing this with a 1000x multiple, makes the query forever execute, and I've no clue if it will actually successfully complete. Not entirely sure how to speed up this process or utilize this in a better fashion. I did try indexing eg CREATE INDEX ON table((params->>'owners')); to no avail. Query has run >1 hour, and there are multiple rows similar to this.

Am i indexing incorrectly? Also, I've looked into the gin operator and @> won't help since each owner field differs

Avoid unnecessary updates with a WHERE clause that filters out the rows that don't need to be modified. Perhaps supporting that condition with an index can help.

You may want to run run VACUUM (FULL) once the update is done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM