简体   繁体   English

如何优化以下MySQL查询以实现每秒并发调用?

[英]How can I optimize the following MySQL query to achieve concurrent calls per seconds?

The following query read data from DB1.Data table, the query working correctly but is very slow. 以下查询从DB1.Data表中读取数据,该查询正常运行,但速度很慢。 This query result is concurrent calls from CDR information. 该查询结果是来自CDR信息的并发呼叫。

Mysql query MySQL查询

select sql_calc_found_rows H,M,S,(TCNT+ADCNT) as CNT from
(
select H,M,S,sum(CNT) as TCNT,
(
select 
count(id) as CNT
from DB1.Data force index (datetimeOrgination)  where 1=1 and 
(datetimeOrgination<UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))  and (datetimeOrgination+callDuration)>UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))) 
  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))   
) as ADCNT 
 from 
(
(select 
hour(from_unixtime(datetimeOrgination)) as H,
minute(from_unixtime(datetimeOrgination)) as M,
second(from_unixtime(datetimeOrgination)) as S,
count(id) as CNT  
from DB1.Data where 1=1  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination)),minute(from_unixtime(datetimeOrgination)),second(from_unixtime(datetimeOrgination)))

Union  all

(select 
hour(from_unixtime(datetimeOrgination+callDuration)) as H,
minute(from_unixtime(datetimeOrgination+callDuration)) as M,
second(from_unixtime(datetimeOrgination+callDuration)) as S,
count(id) as CNT 
from DB1.Data  force index (datetimeOrgination) where 1=1 and  
(second(from_unixtime(datetimeOrgination+callDuration))>second(from_unixtime(datetimeOrgination)))   and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination+callDuration)),minute(from_unixtime(datetimeOrgination+callDuration)),second(from_unixtime(datetimeOrgination+callDuration)))
) as T1  group by H,M,S
) as T2;

Here's the explain output 这是说明输出

说明查询

This is the query output in JSON format: 这是JSON格式的查询输出:

{
"meta": {
    "count": 18,
    "totalCount": 18
},
"calls": [{
    "H": 10,
    "M": 30,
    "S": 44,
    "CNT": 1
}, {
    "H": 11,
    "M": 27,
    "S": 1,
    "CNT": 1
}, {
    "H": 11,
    "M": 28,
    "S": 44,
    "CNT": 1
}, {
    "H": 12,
    "M": 23,
    "S": 52,
    "CNT": 1
}, {
    "H": 12,
    "M": 29,
    "S": 27,
    "CNT": 1
}, {
    "H": 12,
    "M": 30,
    "S": 38,
    "CNT": 1
}, {
    "H": 14,
    "M": 26,
    "S": 17,
    "CNT": 1
}, {
    "H": 14,
    "M": 26,
    "S": 44,
    "CNT": 1
}, {
    "H": 14,
    "M": 26,
    "S": 51,
    "CNT": 1
}, {
    "H": 14,
    "M": 27,
    "S": 2,
    "CNT": 1
}, {
    "H": 14,
    "M": 27,
    "S": 8,
    "CNT": 1
}, {
    "H": 14,
    "M": 40,
    "S": 27,
    "CNT": 1
}, {
    "H": 14,
    "M": 40,
    "S": 57,
    "CNT": 1
}, {
    "H": 14,
    "M": 40,
    "S": 58,
    "CNT": 1
}, {
    "H": 15,
    "M": 8,
    "S": 4,
    "CNT": 1
}, {
    "H": 15,
    "M": 8,
    "S": 31,
    "CNT": 1
}, {
    "H": 15,
    "M": 56,
    "S": 38,
    "CNT": 1
}, {
    "H": 16,
    "M": 27,
    "S": 30,
    "CNT": 1
}]

} }

The first record in result 结果中的第一条记录

  "H": 10,
    "M": 30,
    "S": 44,
    "CNT": 1

shows we have 1 concurrent call at 10:30:44 显示我们在10:30:44有1个并发呼叫


More details 更多细节

For calculate the concurrent calls per seconds, we should count 3 type of calls per second. 为了计算每秒的并发呼叫数,我们应该计算每秒3种呼叫类型。

For example, if we want to calculate concurrent calls for 10:51:20 we need to count all of the following: 例如,如果我们要计算10:51:20的并发调用,则需要计算以下所有内容:

Step 1-Count all calls started at 10:51:20 步骤1:计算所有在10:51:20开始的通话

Step 2-Count all calls ended at 10:51:20, but not started in the same second(20). 步骤2-计算所有呼叫在10:51:20结束,但未在同一秒开始(20)。

Step 3-Count all calls started before 10:51:20 and ended after 10:51:20. 步骤3-计算所有在10:51:20之前开始并在10:51:20之后结束的呼叫。

Step 4- Finally needs to sum all of them to calculate the concurrent calls. 步骤4-最后,需要对所有这些求和进行求和以计算并发调用。

This query is for Step 1 此查询适用于步骤1

(select 
hour(from_unixtime(datetimeOrgination)) as H,
minute(from_unixtime(datetimeOrgination)) as M,
second(from_unixtime(datetimeOrgination)) as S,
count(id) as CNT  
from DB1.Data where 1=1  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination)),minute(from_unixtime(datetimeOrgination)),second(from_unixtime(datetimeOrgination)))

This query is for Step 2 该查询适用于步骤2

(select 
hour(from_unixtime(datetimeOrgination+callDuration)) as H,
minute(from_unixtime(datetimeOrgination+callDuration)) as M,
second(from_unixtime(datetimeOrgination+callDuration)) as S,
count(id) as CNT 
from DB1.Data  force index (datetimeOrgination) where 1=1 and  
(second(from_unixtime(datetimeOrgination+callDuration))>second(from_unixtime(datetimeOrgination)))   and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination+callDuration)),minute(from_unixtime(datetimeOrgination+callDuration)),second(from_unixtime(datetimeOrgination+callDuration)))

This query is for Step 3 from the union result of 2 previous query 该查询是针对前2个查询的并集结果的第3步查询

(
select 
count(id) as CNT
from DB1.Data force index (datetimeOrgination)  where 1=1 and 
(datetimeOrgination<UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))  and (datetimeOrgination+callDuration)>UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))) 
  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))   
) as ADCNT 

This query gathers all of them and returns the final result. 该查询将收集所有这些查询并返回最终结果。

select sql_calc_found_rows H,M,S,(TCNT+ADCNT) as CNT from
(

As I mentioned before, that query working but very slow and complex, I know needs optimization and simplification. 如前所述,该查询有效但非常缓慢且复杂,我知道需要优化和简化。


Field types 栏位类型

`datetimeOrgination` BIGINT(20) NOT NULL DEFAULT
`callDuration` BIGINT(20) NOT NULL DEFAULT '0',

and indexs 和索引

INDEX `datetimeOrgination` (`datetimeOrgination`),
INDEX `callDuration` (`callDuration`),

Caveat: Some of my suggestions are for clarity or simplification, not necessarily for speed. 警告:我的一些建议是为了清楚或简化,不一定是为了提高速度。

Potential bug: and (second(from_unixtime(datetimeOrgination+callDuration)) > second(from_unixtime(datetimeOrgination))) does not make much sense. 潜在的错误: and (second(from_unixtime(datetimeOrgination+callDuration)) > second(from_unixtime(datetimeOrgination)))没有多大意义。 It will catch a 2-second call that started at 11:22:00, but not one that started at 11:21:59. 它将捕获从11:22:00开始的2秒呼叫,但不会捕获从11:21:59开始的呼叫。 Is that really what you wanted? 那真的是您想要的吗? In any case, please explain what the query is trying to do. 无论如何,请说明查询要执行的操作。

Don't work with H,M,S, work with just seconds -- either by extracting the hh:mm:ss string from the date, or by getting the time of day in seconds. 不要使用H,M,S,只需几秒钟即可工作-通过从日期中提取hh:mm:ss字符串,或以秒为单位获取一天中的时间。 Convert to H,M,S as the last step, not the first . 转换为H,M,S作为最后一步,而不是第一步

Don't FORCE INDEX -- it may help today, but hurt tomorrow. 不要FORCE INDEX -今天可能会有所帮助,但明天会受到伤害。

Change and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') AND UNIX_TIMESTAMP('2018-02-09 23:59:59')) to and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') AND UNIX_TIMESTAMP('2018-02-09 23:59:59'))更改为

  AND  DB1.Data.datetimeOrgination >= '2018-02-00'
  AND  DB1.Data.datetimeOrgination  < '2018-02-00' + INTERVAL 1 DAY

(Again, that is for clarity, not speed.) (同样,这是为了清楚起见,而不是速度。)

Use COUNT(*) instead of COUNT(id) 使用COUNT(*)而不是COUNT(id)

I'm doing a lot of guessing; 我正在做很多猜测; help us out by providing SHOW CREATE TABLE . 通过提供SHOW CREATE TABLE帮助我们。 It smells like you are using the wrong datatype for datetimeOrgination . 闻起来好像您为datetimeOrgination使用了错误的数据类型。

After converting to seconds (from H,M,S), this 转换为秒(从H,M,S)后,

 datetimeOrgination < UNIX_TIMESTAMP(concat('2018-02-09',' ',',T1.H,':',T1.M,':',T1.S)

becomes something like 变成像

 datetimeOrgination < '2018-02-09' + INTERVAL secs SECOND

Even better would be to extract the datetime from the subquery and move to something like 更好的是从子查询中提取日期时间,然后移至类似

 datetimeOrgination < datetime_from_subquery

This may give a better chance of using the index. 这样可能会更好地使用索引。

Cleanup the code and explain the goal; 清理代码并说明目标; I'll try to come up with some more speedups. 我将尝试提出更多的加速方案。

(Since the definition of the problem is moving, I am starting a new Answer.) (由于问题的定义正在变化,所以我开始一个新的答案。)

The number of calls (of all types) at a specific point in time is simply: 特定时间点的(所有类型的)呼叫次数很简单:

SELECT COUNT(*) FROM tbl
    WHERE call_start            <= '2018-02-14 15:11:35'
    WHERE call_start + duration >= '2018-02-14 15:11:35';

But, I will quibble that the answer is "high" because it does not take into account what part of the given second the call started or ended. 但是,我会怀疑答案是“高”的,因为它没有考虑呼叫在给定秒数的哪一部分开始或结束。 So, I think this is closer to correct: 因此,我认为这更接近纠正:

SELECT COUNT(*) FROM tbl
    WHERE call_start            <  '2018-02-14 15:11:35'
    WHERE call_start + duration >= '2018-02-14 15:11:35';

This should come as close as possible to saying how many concurrent calls happened at exactly '2018-02-14 15:11:35.000000'; 这应该尽可能地接近确切地说'2018-02-14 15:11:35.000000'发生了多少个并发调用; it is an approximation of the number for '2018-02-14 15:11:35.5'. 它是'2018-02-14 15:11:35.5'的近似数字。

By changing COUNT(*) to SUM(...) (as already discussed), you can get the count for a given type of call. 通过将COUNT(*)更改为SUM(...) (如前所述),可以获得给定类型的呼叫的计数。

Then you add GROUP BY using datetime or timestamp arithmetic to finish out the task. 然后,您可以使用datetime或timestamp算法添加GROUP BY以完成任务。

One day 一天

To catch all calls that started during a single day: 接听一天中开始的所有呼叫:

WHERE call_start >= '2018-02-09'
  AND call_start  < '2018-02-09' + INTERVAL 1 DAY

Problem Definition is wrong 问题定义错误

For calculate the concurrent calls per seconds, we should count 3 type of calls per second... 为了计算每秒的并发呼叫数,我们应该计算每秒3种呼叫类型...

I contend that that is mathematically wrong. 我认为这在数学上是错误的。

"Concurrent calls" is at an instant, not across a whole second (or hour or day). “并发呼叫”是即时的,而不是一秒钟(或一小时或一天)。 It means "how many phone connections are in use at that instant. 这表示“当时正在使用多少个电话连接。

Let me change the statement of the problem to "concurrent calls per hour". 让我将问题的陈述更改为“每小时并发通话”。 Does that make sense? 那有意义吗? You can ask about "calls per hour", which could be interpreted as "calls initiated per hour" and be computed via datetimeOrgination and a GROUP BY . 可以询问“每小时呼叫”,这可以解释为“每小时发起的呼叫”,可以通过datetimeOrginationGROUP BY进行计算。

Suppose I calls at the start of each minute, and each lasted 59 seconds. 假设我在每分钟开始时打电话,每次持续59秒。 A single phone line could handle that. 一条电话线就可以解决这个问题。 I suggest that is "1 concurrent call". 我建议是“ 1个并发调用”。

In contrast, what if I had 60 people all starting their 59-second calls at noon. 相反,如果我有60个人都在中午开始他们59秒的通话,该怎么办? That would take 60 phone lines. 那将需要60条电话线。 That would be 60 concurrent calls during the busy time of the day. 在一天的繁忙时间内,这将是60个并发呼叫。

The metric you have involves a datetimeOrgination that is truncated (or rounded?) to a 1-second boundary. 您拥有的指标涉及一个datetimeOrgination ,它被截断(或四舍五入到1秒)边界。

Not let me modify the example to better explain why your 3 steps are wrong. 让我不要修改示例以更好地解释您的3个步骤错误的原因。 I want to group by hour, and I am willing to measure the number of calls at the top of the hour. 我想按小时分组,并且我愿意在小时的顶部衡量通话次数。 In particular, let's look at the 10 o'clock hour. 特别地,让我们看一下10点钟的时间。

  • 09:55 - 10:05 -- a 10-minute call that is counted, by your algorithm in each of 09 and 10 hours. 09:55-10:05-根据您的算法,在09到10个小时中,每10分钟的通话被计算在内。
  • 10:20 - 10:30 -- a 10-minute call that is counted, by your algorithm in only the 10 hour. 10:20-10:30-根据您的算法,仅在10小时内计算的10分钟通话时间。

Why should a 10-minute call be counted as belong to two hours? 为什么将10分钟的通话计为两个小时? This inflates the "concurrency" count. 这会增加“并发”计数。

  • 09:05 - 10:55 -- a 110-minute call that is also counted in each of 09 and 10 hours. 09:05-10:55-一个110分钟的通话时间也算在09和10小时中。
  • 09:30 - 11:30 -- a 110-minute call that is also counted 3 hours. 09:30-11:30-110分钟的通话时间也算为3个小时。 Again, over-counting. 再次,过度计数。

So, I contend that the only reasonable computation is to 因此,我认为唯一合理的计算是

Step 1-Count all calls started at 10:51:20 -- counted as happening at the :20 instant. 第1步-计算所有始于10:51:20的呼叫-计算为在:20瞬间发生。

Step 2-Count all calls ended at before 10:51:20, but not started in the same second(20). 步骤2-计算所有呼叫均 10:51:20 之前结束,但未在同一秒(20)中开始。 -- not counted for :20. - 计入:20。

Step 3-Count all calls started before 10:51:20 and ended after 10:51:20. 步骤3-计算所有在10:51:20之前开始并在10:51:20之后结束的呼叫。 -- counted for the :20 instant. -计算为:20瞬间。

My suggested solution achieves that modification, and is both simpler and mathematically 'correct'. 我建议的解决方案可以实现这种修改,并且更简单且在数学上是“正确的”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM