繁体   English   中英

在SQL中联接大表

[英]Joining large tables in SQL

我有一个名为“ calls”的表,有以下几列:

a_imei_number
b_imei_number
a_phone_number
b_phone_number

call_start_time
call_end_time

如果一个名为x的特定电话呼叫y,则x的imei号在a_imei_number列中;如果y呼叫x,则x的imei数在b_imei_number a_imei_numberb_imei_number之间的a_imei_number不久是imei的传入和传出呼叫。 对于phone_number列也是如此。

我正在搜索在同一时间发生的特定imei呼叫(克隆的imei号码),因此我想如果我找到一个呼叫,该呼叫的call_start_time在其他人的call_start_time和call_end_time之间,那么我会找到克隆的电话。 因此,逻辑上,IMEI号码必须相同,电话号码也必须不同。

所以我写了

select * from calls c1 , calls c2 
where (c1.a_imei = 1234 or c1.b_imei = 1234) 
and 
c1.call_start_time between c2.call_start_time and c2.call_end_time

该表可能有500M数据。 因此此查询未返回,结果可能会在1周内返回。 还有其他方法可以在不像这样加入同一张表的情况下找到结果吗?

这可能不会完全帮助您,但希望会给更多知识的人以一些开始的知识:

改善加入

SELECT * 
FROM calls c1 
INNER JOIN calls c2 ON c1.call_start_time BETWEEN c2.call_start_time AND c2.call_end_time
WHERE (c1.a_imei = 1234 or c1.b_imei = 1234) 

其他的建议:

SELECT *本身将是效率低下的,尤其是因为它将返回非唯一列名时,您仅应选择与所查询有关的列。

如果我理解正确,则您正在查找与拨打特定号码或同时拨打特定号码的电话。 以下查询表达了这个想法:

select c2.*
from (select c.*
      from calls c
      where c.a_imei = 1234 or c.b_imei = 1234
     ) cbase join
     calls c2
     on cbase.call_start_time between c2.call_start_time and c2.call_end_time;

性能将在很大程度上取决于第一个查询的匹配数。

有时,数据库引擎很难优化or处于某种状况。 我建议对calls(a_imei, call_start_time)calls(b_imei, call_start_time)查询重写为:

select c2.*
from ((select c.call_start_time
       from calls c
       where c.a_imei = 1234
      ) union all
      (select c.call_start_time
       from calls c
       where c.b_imei = 1234
      )
     ) cbase join
     calls c2
     on cbase.call_start_time between c2.call_start_time and c2.call_end_time;

对于最后的连接,第三个索引将很有用: calls(call_start_time, call_end_time)

您可以做一些事情来改善您的查询。

索引

看来您应该在a_imei和b_imei上定义索引。 也许您还希望在这些索引中也包括呼叫开始和结束时间,这取决于。

指定列

不要使用select * ,而是指定要返回的列的列表。

select
    a_imei_number,
    b_imei_number,
    call_start_time,
    call_end_time

正确加入

这完全取决于您要在结果中寻找什么。 如果要报告所有可能的重复项,则可以采用一种方法进行构造。

select c2.a_imei, c2.b_imei, c2.call_start_time, c2.call_end_time
from (select c.a_imei, c.b_imei, c.call_start_time, c.call_end_time
      from calls c
      where c.a_imei = c.b_imei
     ) cbase join
     calls c2
     on cbase.call_start_time between c2.call_start_time and c2.call_end_time;

如果您有一个已知的imei_number并要搜索,则查询的结构将有所不同。

select c2.a_imei, c2.b_imei, c2.call_start_time, c2.call_end_time
from (select c.a_imei, c.b_imei, c.call_start_time, c.call_end_time
      from calls c
      where c.a_imei = 1234 or c.b_imei = 1234
     ) cbase join
     calls c2
     on cbase.call_start_time between c2.call_start_time and c2.call_end_time;

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM