简体   繁体   中英

PostgreSQL query decomposition

I fail to decompose simple SQL queries. I use PostgreSQL but my question is also related to other RDBMS.

Consider the following example. We have table orders and we want to find first order after which total amount exceeded some limit:

drop table if exists orders cascade;

/**
Table with clients' orders
*/
create table orders(
date timestamp,
amount integer
/**
Other columns omitted
*/
);

/**
Populate with test data
*/
insert into orders(date,amount) 
values
('2011-01-01',50),
('2011-01-02',49),
('2011-01-03',2),
('2011-01-04',1000);

/**
Selects first order that caused exceeding of limit 
*/
create view first_limit_exceed
as
select min(date) from
(
    select  o1.date
    from orders o1,
         orders o2
    where o2.date<=o1.date
    group by o1.date
    having sum(o2.amount) > 100
) limit_exceed;

/**
returns "2011-01-03 00:00:00"
*/
select * from first_limit_exceed;

Now let's make the problem a little harder. Consider we want to find total amount only for rows that satisfy some predicate. We have a lot of such predicates and creating separate version of view first_limit_exceed would be terrible code duplication. So we need some way to create parameterized view and pass either filtered set of rows or predicate itself to it. In Postgres we can use query language functions as parameterized views. But Postgres does not allow function to get as argument neither set of row nor another function. I still can use string interpolation on client's side or in plpgsql function, but it is error-prone and hard to test and debug. Any advice?

In PostgreSQL 8.4 and later:

SELECT  *
FROM    (
        SELECT  *,
                SUM(amount) OVER (ORDER BY date) AS psum
        FROM    orders
        ) q
WHERE   psum > 100
ORDER BY
        date
LIMIT 1

Add any predicates you want into the inner query:

SELECT  *
FROM    (
        SELECT  *,
                SUM(amount) OVER (ORDER BY date) AS psum
        FROM    orders
        WHERE   date >= '2011-01-03'
        ) q
WHERE   psum > 100
ORDER BY
        date
LIMIT 1

It sounds a bit like you're trying to put too much code into the database. If you are interested in the rows of a certain relation that satisfy a particular predicate, just execute a select statement with an appropriate where clause in the client code. Having views that take predicates as parameters is reinventing the wheel that sql already solves nicely.

On the other hand, I can see an argument for storing queries themselves in the database, so that they can be composed into larger reports. This two is still better handled by application code. I might approach a problem like that by using a library that's good at dynamic sql generatation, (for example sqlalchemy), and then storing the query representations (sqlalchemy expression objects are 'pickleable') as blobs in the database.

To put it another way, databases are representers of facts , You store knowledge in them. applications have the duty of acting on user requests , When you find yourself defining transformations on the data, that's really more a matter of anticipating and implementing the requests of actual users, rather than just faithfully preserving knowledge.

Views are best used when the schema inevitably changes, so you can leave older applications that don't need to know about the new schema in a working state.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM