简体   繁体   中英

Using ST_MakePoint for dataset with over 1 billion rows

i have a global dataset in my postgres database (9.2.4 and postgis 2.1.0SVN) with ~ 1.1 billion rows. my aim is to extract relevant rows using a polygon. query is following and running since one day.

UPDATE table SET geom = ST_SetSRID(ST_MakePoint(long,lat),4326) where lat !=666 ;

666 was the placeholder for missing values. column lat has btree index.

free -m gives my following stats for ram

total       used       free     shared    buffers     cached
Mem:         24104      23829        275          0          5      22738
-/+ buffers/cache:       1084      23020
Swap:        24574        309      24265

htop shows my almost no cpu load, with 9% memory.

Is the query running anymore or kinda on hold because of lacking ram?

any comment or hint appreciated.

It's better to use CURSORs and spread your dataset to parts near 1000 rows .

Documetation about cursors is here^ https://www.postgresqltutorial.com/plpgsql-cursor/

If possible never use an update as a mass operation. For that purpose, much, much better is CTAS and dropping and renaming tables. First, create table as select (CTAS)

Create table "new_table" as
select column_1, column_2.... column_n, ST_SetSRID(ST_MakePoint(long,lat),4326) geom
where  lat !=666;

Check your results if necessary.

Now drop your old table and rename new one.

drop table "old_table";
Alter table "new_table" rename "old_table".

Create all needed indexes, foreign keys etc for renamed table.

If there are foreign keys and other constraints on old_table just

alter table old_table disable trigger all;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM