[英]Joining large tables in SQL
我有一个名为“ calls”的表,有以下几列:
a_imei_number b_imei_number a_phone_number b_phone_number call_start_time call_end_time
如果一个名为x的特定电话呼叫y,则x的imei号在a_imei_number
列中;如果y呼叫x,则x的imei数在b_imei_number
。 a_imei_number
和b_imei_number
之间的a_imei_number
不久是imei的传入和传出呼叫。 对于phone_number列也是如此。
我正在搜索在同一时间发生的特定imei呼叫(克隆的imei号码),因此我想如果我找到一个呼叫,该呼叫的call_start_time在其他人的call_start_time和call_end_time之间,那么我会找到克隆的电话。 因此,逻辑上,IMEI号码必须相同,电话号码也必须不同。
所以我写了
select * from calls c1 , calls c2
where (c1.a_imei = 1234 or c1.b_imei = 1234)
and
c1.call_start_time between c2.call_start_time and c2.call_end_time
该表可能有500M数据。 因此此查询未返回,结果可能会在1周内返回。 还有其他方法可以在不像这样加入同一张表的情况下找到结果吗?
这可能不会完全帮助您,但希望会给更多知识的人以一些开始的知识:
改善加入
SELECT *
FROM calls c1
INNER JOIN calls c2 ON c1.call_start_time BETWEEN c2.call_start_time AND c2.call_end_time
WHERE (c1.a_imei = 1234 or c1.b_imei = 1234)
其他的建议:
SELECT *
本身将是效率低下的,尤其是因为它将返回非唯一列名时,您仅应选择与所查询有关的列。
如果我理解正确,则您正在查找与拨打特定号码或同时拨打特定号码的电话。 以下查询表达了这个想法:
select c2.*
from (select c.*
from calls c
where c.a_imei = 1234 or c.b_imei = 1234
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
性能将在很大程度上取决于第一个查询的匹配数。
有时,数据库引擎很难优化or
处于某种状况。 我建议对calls(a_imei, call_start_time)
和calls(b_imei, call_start_time)
查询重写为:
select c2.*
from ((select c.call_start_time
from calls c
where c.a_imei = 1234
) union all
(select c.call_start_time
from calls c
where c.b_imei = 1234
)
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
对于最后的连接,第三个索引将很有用: calls(call_start_time, call_end_time)
。
您可以做一些事情来改善您的查询。
索引
看来您应该在a_imei和b_imei上定义索引。 也许您还希望在这些索引中也包括呼叫开始和结束时间,这取决于。
指定列
不要使用select *
,而是指定要返回的列的列表。
select
a_imei_number,
b_imei_number,
call_start_time,
call_end_time
正确加入
这完全取决于您要在结果中寻找什么。 如果要报告所有可能的重复项,则可以采用一种方法进行构造。
select c2.a_imei, c2.b_imei, c2.call_start_time, c2.call_end_time
from (select c.a_imei, c.b_imei, c.call_start_time, c.call_end_time
from calls c
where c.a_imei = c.b_imei
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
如果您有一个已知的imei_number
并要搜索,则查询的结构将有所不同。
select c2.a_imei, c2.b_imei, c2.call_start_time, c2.call_end_time
from (select c.a_imei, c.b_imei, c.call_start_time, c.call_end_time
from calls c
where c.a_imei = 1234 or c.b_imei = 1234
) cbase join
calls c2
on cbase.call_start_time between c2.call_start_time and c2.call_end_time;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.