简体   繁体   中英

In SQL how to select previous rows based on the current row values?

I've a simple SQL table which looks like this-

CREATE TABLE msg (
    from_person character varying(10),
    from_location character varying(10),
    to_person character varying(10),
    to_location character varying(10),
    msglength integer,
    ts timestamp without time zone
);

样本数据

I want to find out for each row in the table if a different 'from_person' and 'from_location' has interacted with the 'to_person' in the current row in last 3 minutes.

For example, in above table, for row # 4, other than mary from Mumbai (current row), nancy from NYC and bob from Barcelona has also sent a message to charlie in last 3 minutes so the count is 2.

Similarly, for row#2, other than bob from Barcelona (current row), only nancy from NYC has sent a message to charlie in ca (current row) so the count is 1

Example desired output-

0
1
0
2

I tried using window function but it seems that in frame clause I can specify rows count before and after but I can't specify a time itself.

As is well known, every table in Postgres has a primary key. Or should have at least. It would be great if you had a primary key defining expected order of rows.

Example data:

create table msg (
    id int primary key,
    from_person text,
    to_person text,
    ts timestamp without time zone
);

insert into msg values
(1, 'nancy',   'charlie', '2016-02-01 01:00:00'),
(2, 'bob',     'charlie', '2016-02-01 01:00:00'),
(3, 'charlie', 'nancy',   '2016-02-01 01:00:01'),
(4, 'mary',    'charlie', '2016-02-01 01:02:00');

The query:

select m1.id, count(m2)
from msg m1
left join msg m2
on m2.id < m1.id
and m2.to_person = m1.to_person
and m2.ts >= m1.ts- '3m'::interval
group by 1
order by 1;

 id | count 
----+-------
  1 |     0
  2 |     1
  3 |     0
  4 |     2
(4 rows)

In the lack of a primary key you can use the function row_number() , for example:

with msg_with_rn as (
    select *, row_number() over (order by ts, from_person desc) rn
    from msg
    )
select m1.id, count(m2)
from msg_with_rn m1
left join msg_with_rn m2
on m2.rn < m1.rn
and m2.to_person = m1.to_person
and m2.ts >= m1.ts- '3m'::interval
group by 1
order by 1;

Note that I have used row_number() over (order by ts, from_person desc) to get the sequence of rows as you have presented in the question. Of course, you should decide yourself how to resolve ambiguities arising from the same values of the column ts (as in the first two rows).

This should more or less do it. Depending on your requirements, you may need to modify the middle two conditions in the where clause:

select *,
   (select count(*) from msg m2
    where m2.to_person = m1.to_person
        and m2.from_person != m1.from_person
        and m2.from_location != m1.from_location
        and abs(EXTRACT(EPOCH FROM (m1.ts - m2.ts))) <= 3*60)
from msg m1

Building on your actual question , this would be a correct answer:

SELECT count(m2.to_person) AS ct_3min
FROM   msg m1
LEFT   JOIN msg m2
   ON   m2.to_person = m1.to_person
   AND (m2.from_person, m2.from_location) <> (m1.from_person, m1.from_location)
   AND  m2.ts <= m1.ts   -- including same timestamp (?)
   AND  m2.ts >= m1.ts - interval '3 min'
GROUP  BY m1.ctid
ORDER  BY m1.ctid;

Assuming to_person , from_person and from_location are all defined NOT NULL .

Returns:

1   -- !!
1
0
2

Note that the result is basically meaningless without additional columns, any unique combination of columns, ideally a PK. I return the rows in the current physical order - which can change any time without warning. There is no natural order of rows in a relational table. Without an unambiguous ORDER BY clause, the order of result rows is unreliable.

According to your definition the first two rows (according to your displayed order) need to have the same result: 1 - or 0 if you don't count same timestamp - 0 for one and 1 for the other would be incorrect according to your definition.

In the absence of any unique key, I am using the ctid as poor-man's surrogate key. More:

You should still have a primary key defined in your table, but it's by no means compulsory. That's not the only dubious detail in your table layout. You should probably operate with timestamp with time zone , have some NOT NULL constraints and only person_id columns referencing a person table in a properly normalized design. Something like:

CREATE TABLE msg (
   msg_id         serial PRIMARY KEY
 , from_person_id integer NOT NULL REFERENCES person
 , to_person_id   integer NOT NULL REFERENCES person
 , msglength      integer
 , ts             timestamp with time zone
);

Either way, relying on a surrogate PK for the purpose of your query would be plain wrong . The "next" msg_id does not even have to have a later timestamp. In a multi-user database a sequence does not guarantee anything of the sort.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM