[英]SQL Select Inner join one by one
我对数据库有特定的要求(PostgreSQL v9.4.5),但我看不到任何纯SQL的优雅解决方案都能解决(我知道我可以使用Python或其他方法做到这一点,但我有数十亿行数据,计算时间将大大增加)。
我有两个表: 交易和事件 。 这些表都表示一天中发生在订单中的交易(这就是为什么我有数十亿行,我的数据超过几年)的原因,但事件比交易多得多。
两个表都有时间 , 体积和数量列,但是每个表都有其他列(分别为foo和bar ),其中包含特定信息。 我想打的列时 , 体积和价格上的两个表之间的对应关系,因为我知道这存在对应从行业注入到事件 (如果有n行的交易与同时间t,同样的价格P和相同的体积v ,我知道事件中还有n行,时间为t ,价格p和体积v )。
行业:
id | time | price | volume | foo
-----+-----------+---------+--------+-------
201 | 32400.524 | 53 | 2085 | xxx
202 | 32400.530 | 53 | 1162 | xxx
203 | 32400.531 | 52.99 | 50 | xxx
204 | 32400.532 | 52.91 | 3119 | xxx
205 | 32400.837 | 52.91 | 3119 | xxx <--
206 | 32400.837 | 52.91 | 3119 | xxx <--
207 | 32400.837 | 52.91 | 3119 | xxx <--
208 | 32400.839 | 52.92 | 3220 | xxx <--
209 | 32400.839 | 52.92 | 3220 | xxx <--
210 | 32400.839 | 52.92 | 3220 | xxx <--
大事记:
id | time | price | volume | bar
-----+-----------+---------+--------+------
328 | 32400.835 | 52.91 | 3119 | yyy
329 | 32400.837 | 52.91 | 3119 | yyy <--
330 | 32400.837 | 52.91 | 3119 | yyy <--
331 | 32400.837 | 52.91 | 3119 | yyy <--
332 | 32400.838 | 52.91 | 3119 | yyy
333 | 32400.838 | 52.91 | 3119 | yyy
334 | 32400.839 | 52.92 | 3220 | yyy <--
335 | 32400.839 | 52.92 | 3220 | yyy <--
336 | 32400.839 | 52.92 | 3220 | yyy <--
337 | 32400.840 | 52.91 | 2501 | yyy
我想要的是:
time | price | volume | bar | foo
-----------+---------+--------+------+-------
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
我无法进行经典的INNER JOIN操作,否则我将在两个表之间进行所有可能的交叉操作(在这种情况下,我将拥有6x6然后36行)。
尽管可以容纳多行,但是尽管只有一行,而不是一行。
谢谢您的帮助。
编辑:
正如我所说,例如,如果我使用经典的INNER JOIN
SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
我会有类似的东西:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 329 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 330 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 334 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 335 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
但是我想要的是:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
检查此查询-
SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume
试试这个,让我知道。 我们也可以在row_number() over(partion by)
子句中使用,但是我不确定它是否适用于postgreSQL。 无论如何尝试一下。
SELECT
min(t.id) as trade_id,min(e.id) as event_id,
min(t.time) as time,min(t.price) as price,
min(t.volume) as volume, min(e.bar) as bar,
min(t.foo) as foo
FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id
仅查看您提供的样本数据,一种选择是:
SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo) FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume
这是我的row_number示例。
另外,SQL Fiddle: SO 33608351
with
trades AS
(
select 201 as id, 32400.524 as time, 53 as price, 2085 as volume, 'xxx' as foo union all
select 202, 32400.530, 53, 1162, 'xxx' union all
select 203, 32400.531, 52.99, 50, 'xxx' union all
select 204, 32400.532, 52.91, 3119, 'xxx' union all
select 205, 32400.837, 52.91, 3119, 'xxx' union all
select 206, 32400.837, 52.91, 3119, 'xxx' union all
select 207, 32400.837, 52.91, 3119, 'xxx' union all
select 208, 32400.839, 52.92, 3220, 'xxx' union all
select 209, 32400.839, 52.92, 3220, 'xxx' union all
select 210, 32400.839, 52.92, 3220, 'xxx'
),
events as
(
select 328 as id, 32400.835 as time , 52.91 as price , 3119 as volume , 'yyy' as bar union all
select 329 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 330 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 331 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 332 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 333 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 334 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 335 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 336 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 337 , 32400.840 , 52.91 , 2501 , 'yyy'
),
tradesWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from trades
),
eventsWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from events
)
select t.time,
t.price,
t.volume,
t.foo,
e.bar
FROM tradesWithRowNumber t
inner JOIN
eventsWithRowNumber e on e.time = t.time
AND e.price = t.price
AND e.volume = t.volume
and e.RowNum = t.RowNum
如果我理解正确,那么您只想列出foo
和bar
列而不创建笛卡尔乘积。 为此,您可以使用row_number()
引入新列,并加入该列:
SELECT *
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
FROM events e
) e INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as FROM trades t
seqnum
) t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
t.seqnum = e.seqnum;
您是否需要内部联接,左外部联接还是完全外部联接尚不清楚。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.