简体   繁体   中英

PostgreSQL: Slow performance of user-defined function

My function named stat() reads from 2 tables on PostgreSQL 11.

Table T has ~1,000,000 rows, the table D has ~3,000 rows.

My function stat() runs 1.5 secs and it is slow for my use-case:

select * from stat('2019-01-01', '2019-10-01','UTC');

To improve performance I tried to create different indexes (code below), but it did not help.

I was able to improve performance when I put the hardcoded numbers '2019-01-01', '2019-10-01' instead time_start and time_end in the body of stat().

In this case it runs 0.5 sec. But this is not the solution.

CREATE TABLE T(
  id SERIAL PRIMARY KEY,
  time TIMESTAMP WITH TIME zone NOT NULL,
  ext_id INTEGER
); 

CREATE TABLE D(
  id SERIAL PRIMARY KEY,
  time TIMESTAMP WITH TIME zone NOT NULL,
  ext_id INTEGER NOT NULL
);

CREATE INDEX t_time_idx ON T(time);
CREATE INDEX d_time_idx ON D(time);
CREATE INDEX t_ext_idx ON T(ext_id);
CREATE INDEX d_ext_idx ON D(ext_id);

CREATE OR REPLACE FUNCTION stat(time_start varchar, time_end varchar, tz varchar)
RETURNS TABLE (result float)
AS $$
DECLARE
   time_points INTEGER;
   capacity INTEGER;
BEGIN
   time_points := 1000; 
   capacity := 12;
RETURN QUERY
SELECT (total::float / (capacity * time_points))::float  as result
FROM (
   SELECT count(*)::float AS total FROM T
   INNER JOIN (
    SELECT * FROM (
      SELECT ext _id, name, ROW_NUMBER() OVER(PARTITION BY ext_id ORDER BY time desc) AS rk
           FROM D WHERE time at time zone tz < time_end::timestamp
       ) InB WHERE rk = 1
   ) D_INFO
   ON T.ext_id = D_INFO.ext_id
   WHERE T.time at time zone tz between time_start::timestamp and time_end::timestamp
     ) B;
END;
$$
LANGUAGE plpgsql;

Usage:

select * from stat('2019-01-01', '2019-10-01','UTC');  --> takes 1.5 sec, too slow

What I tried:

ANALYZE T;
ANALYZE D;

I created different indexes for T and D tables

CREATE INDEX covering_t_time_ext_idx ON t(ext_id) INCLUDE (time);
CREATE INDEX t_time_ext_idx ON T(time) INCLUDE (ext_id);
CREATE INDEX t_time_ext_multicolumn_idx ON t(time, ext_id);
CREATE INDEX t_time_ext_multicolumn2_idx ON t(ext_id, time);

but it did not help to improve performance.

function.

CREATE OR REPLACE FUNCTION stat(time_start varchar, time_end varchar, tz varchar)
RETURNS TABLE (result float)
AS $$
DECLARE
   time_points INTEGER;
   capacity INTEGER;
BEGIN
   time_points := 1000; 
   capacity := 12;
RETURN QUERY
SELECT (total::float / (capacity * time_points))::float  as result
FROM (
   SELECT count(*)::float AS total 
   FROM T
   WHERE T.time at time zone tz between time_start::timestamp and time_end::timestamp
   AND EXISTS (
      SELECT 1
      FROM D 
      WHERE D.ext_id = T.ext_id
      AND D.time at time zone tz < time_end::timestamp
   )
) B;
END;
$$
LANGUAGE plpgsql;

I solve this by casting the input parameters:

(time_start varchar, time_end varchar)

into intermediate variables with type timestamp:

 DECLARE
    start_time timestamp;
    end_time timestamp;

 BEGIN
    start_time := time_start::timestamp;
    end_time   := time_end::timestamp;

and using these intermediate variables in the SQL instead doing this casting in SQL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM