简体   繁体   中英

set the time property as the m-dimension of postgis geometry or as a separate attribute

Basic version info first:

psql (PostgreSQL) 12.7 (Ubuntu 12.7-1.pgdg18.04+1)
postgis          | 3.1.1

My purpose of using spatial database is to quickly query the GPS trajectories within the specified time scope and space boundary . Currently, basic information about my data is as follow:

-- geometry table column (there are 50,000 rows in table mpart5w-wkt)
test=# \d "mpart5w-wkt"
                       Table "public.mpart5w-wkt"
  Column   |            Type            | Collation | Nullable | Default
-----------+----------------------------+-----------+----------+---------
 driver_id | character varying          |           |          |
 order_id  | character varying          |           |          |
 geom      | geometry(LineStringM,4326) |           |          |
Indexes:
    "mpart5w-wkt_driver_id_idx" btree (driver_id)
    "mpart5w-wkt_geom_idx" gist (geom gist_geometry_ops_nd)


-- meta info
test=# select * from geometry_columns where f_table_name='mpart5w-wkt';

 f_table_catalog | f_table_schema | f_table_name | f_geometry_column | coord_dimension | srid |    type
-----------------+----------------+--------------+-------------------+-----------------+------+-------------
 test            | public         | mpart5w-wkt  | geom              |               3 | 4326 | LINESTRINGM
(1 row)


-- sample data: LINESTRING M (lon lat timestamp)
test=# select st_astext(geom) from "mpart5w-wkt" limit 1;

 LINESTRING M (104.04538 30.70745 1538402919,104.04538 30.70744 1538402928,104.04537 30.70745 1538402938,104.04536 30.70743 1538402948,104.04537 30.7074 1538402958, ...)

Just to emphasize, geom is geometry type of (LineStringM, 4326). GIS index has been built on geom column.

The first question is whether the M-dimension supports multi-index?

I checked the official manual about multi-index , it shows that we can get a 4D-dimensional BRIN index using the 4D operator class:

CREATE INDEX [indexname] ON [tablename]
    USING BRIN ([geome_col] brin_geometry_inclusion_ops_4d);

At the same time, we can get the an n-dimensional gist index for the geometry type using this syntax:

CREATE INDEX [indexname] ON [tablename] USING GIST ([geometryfield] gist_geometry_ops_nd);

So, I guess it's helpful to build 4D gist index as long as ZM dimensions are provided.

In most description of functions , "This function supports 3d and will not drop the z-index." is mentioned without mentioning m-index, while I have no more idea about the m-index.

After all, there is no solid evidence shows whether the M-dimension supports multi-index and how to use multi-index on M-dimension.

Maybe I should create table like this, so that I need not to deal with the M dimension any more?

create table "part5w-wkt"(
    driver_id varchar,
    order_id varchar,
    geom geometry(Linestring, 4326), 
    min_time timestamp,
    max_time timestamp
);

-- example (both start_time and end_time are parameters)
select * from "mpart5w-wkt" 
where st_intersects(
    geom, 
    ST_MakeEnvelope(104.067, 30.657, 104.083, 30.671, 4326)
) and (
    (min_time < start_time and start_time < max_time) 
     or
    (min_time < end_time and end_time < max_time)
)

The second question is how to use 2D-index with boundary box?

After all, there is no evidence shows that using a m-dimension with gist index is more convenient than using 2D geometry with a separate attribute about time. So, I decide to make a test on 2D-index firstly.

-- test 1
explain analyze
select count(order_id) from "mpart5w-wkt" 
where st_intersects(
    st_force2d(geom), 
    ST_MakeEnvelope(104.067, 30.657, 104.083, 30.671, 4326)
);

-- test 2
explain analyze
select count(order_id) from "mpart5w-wkt" 
where st_intersects(
    st_force2d(geom), 
    st_geometryfromtext(
        'polygon((104.067 30.671, 104.083 30.671, 104.083 30.657, 104.067 30.657, 104.067 30.671))', 
        4326
    )
);

-- test 3
explain analyze
select count(order_id) from "mpart5w-wkt" 
where st_intersects(
    st_force2d(geom), 
    'SRID=4326;polygon((104.067 30.671, 104.083 30.671, 104.083 30.657, 104.067 30.657, 104.067 30.671))'::geometry
);

-- almost the same result
 Finalize Aggregate  (cost=547292.05..547292.06 rows=1 width=8) (actual time=817.698..824.482 rows=1 loops=1)
   ->  Gather  (cost=547291.84..547292.05 rows=2 width=8) (actual time=817.380..824.451 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=546291.84..546291.85 rows=1 width=8) (actual time=804.706..804.707 rows=1 loops=3)
               ->  Parallel Seq Scan on "mpart5w-wkt"  (cost=0.00..546291.83 rows=2 width=19) (actual time=97.585..803.734 rows=4394 loops=3)
                     Filter: st_intersects(st_force2d(geom), '0103000020E610000001000000050000003F355EBA49045A40D578E92631A83E403F355EBA49045A40B29DEFA7C6AB3E405A643BDF4F055A40B29DEFA7C6AB3E405A643BDF4F055A40D578E92631A83E403F355EBA49045A40D578E92631A83E40'::geometry)
                     Rows Removed by Filter: 12272
 Planning Time: 0.268 ms
 JIT:
   Functions: 17
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.771 ms, Inlining 99.859 ms, Optimization 117.378 ms, Emission 74.123 ms, Total 294.132 ms
 Execution Time: 825.745 ms
(14 rows)

However, the test shows that results are almost the same and the spatial index didn't work even thouth I have done many different tests. Removing the st_force2d() function got a lower efficiency.

The efficiency if reffered to be lower on the same work with addtional time constraints.

By the way, Which of 4326 and 3857 should I use as the SRID to store GPS trajectory geometry if lon-lat boundary box is used frequently while computing distance is needed at the same time?

You can tell the index to sort its records already using geom as 2D by means of using the function ST_Force2D in the index creation, so that the database doesn't need to do it in query time:

CREATE INDEX idx_part5w_wkt_geom ON "part5w-wkt" 
USING gist (ST_Force2D(geom) gist_geometry_ops_nd);

It will have a similar effect if you just omit the ST_Force2D in the CREATE INDEX as long as you also don't use it later on in the WHERE clause. Long story short: the way columns are indexed and how they're queried have to match, otherwise the index is probably not going to be used.

Demo: db<>fiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM