简体   繁体   中英

How can I optimize the following MySQL query to achieve concurrent calls per seconds?

The following query read data from DB1.Data table, the query working correctly but is very slow. This query result is concurrent calls from CDR information.

Mysql query

select sql_calc_found_rows H,M,S,(TCNT+ADCNT) as CNT from
(
select H,M,S,sum(CNT) as TCNT,
(
select 
count(id) as CNT
from DB1.Data force index (datetimeOrgination)  where 1=1 and 
(datetimeOrgination<UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))  and (datetimeOrgination+callDuration)>UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))) 
  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))   
) as ADCNT 
 from 
(
(select 
hour(from_unixtime(datetimeOrgination)) as H,
minute(from_unixtime(datetimeOrgination)) as M,
second(from_unixtime(datetimeOrgination)) as S,
count(id) as CNT  
from DB1.Data where 1=1  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination)),minute(from_unixtime(datetimeOrgination)),second(from_unixtime(datetimeOrgination)))

Union  all

(select 
hour(from_unixtime(datetimeOrgination+callDuration)) as H,
minute(from_unixtime(datetimeOrgination+callDuration)) as M,
second(from_unixtime(datetimeOrgination+callDuration)) as S,
count(id) as CNT 
from DB1.Data  force index (datetimeOrgination) where 1=1 and  
(second(from_unixtime(datetimeOrgination+callDuration))>second(from_unixtime(datetimeOrgination)))   and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination+callDuration)),minute(from_unixtime(datetimeOrgination+callDuration)),second(from_unixtime(datetimeOrgination+callDuration)))
) as T1  group by H,M,S
) as T2;

Here's the explain output

说明查询

This is the query output in JSON format:

{
"meta": {
    "count": 18,
    "totalCount": 18
},
"calls": [{
    "H": 10,
    "M": 30,
    "S": 44,
    "CNT": 1
}, {
    "H": 11,
    "M": 27,
    "S": 1,
    "CNT": 1
}, {
    "H": 11,
    "M": 28,
    "S": 44,
    "CNT": 1
}, {
    "H": 12,
    "M": 23,
    "S": 52,
    "CNT": 1
}, {
    "H": 12,
    "M": 29,
    "S": 27,
    "CNT": 1
}, {
    "H": 12,
    "M": 30,
    "S": 38,
    "CNT": 1
}, {
    "H": 14,
    "M": 26,
    "S": 17,
    "CNT": 1
}, {
    "H": 14,
    "M": 26,
    "S": 44,
    "CNT": 1
}, {
    "H": 14,
    "M": 26,
    "S": 51,
    "CNT": 1
}, {
    "H": 14,
    "M": 27,
    "S": 2,
    "CNT": 1
}, {
    "H": 14,
    "M": 27,
    "S": 8,
    "CNT": 1
}, {
    "H": 14,
    "M": 40,
    "S": 27,
    "CNT": 1
}, {
    "H": 14,
    "M": 40,
    "S": 57,
    "CNT": 1
}, {
    "H": 14,
    "M": 40,
    "S": 58,
    "CNT": 1
}, {
    "H": 15,
    "M": 8,
    "S": 4,
    "CNT": 1
}, {
    "H": 15,
    "M": 8,
    "S": 31,
    "CNT": 1
}, {
    "H": 15,
    "M": 56,
    "S": 38,
    "CNT": 1
}, {
    "H": 16,
    "M": 27,
    "S": 30,
    "CNT": 1
}]

}

The first record in result

  "H": 10,
    "M": 30,
    "S": 44,
    "CNT": 1

shows we have 1 concurrent call at 10:30:44


More details

For calculate the concurrent calls per seconds, we should count 3 type of calls per second.

For example, if we want to calculate concurrent calls for 10:51:20 we need to count all of the following:

Step 1-Count all calls started at 10:51:20

Step 2-Count all calls ended at 10:51:20, but not started in the same second(20).

Step 3-Count all calls started before 10:51:20 and ended after 10:51:20.

Step 4- Finally needs to sum all of them to calculate the concurrent calls.

This query is for Step 1

(select 
hour(from_unixtime(datetimeOrgination)) as H,
minute(from_unixtime(datetimeOrgination)) as M,
second(from_unixtime(datetimeOrgination)) as S,
count(id) as CNT  
from DB1.Data where 1=1  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination)),minute(from_unixtime(datetimeOrgination)),second(from_unixtime(datetimeOrgination)))

This query is for Step 2

(select 
hour(from_unixtime(datetimeOrgination+callDuration)) as H,
minute(from_unixtime(datetimeOrgination+callDuration)) as M,
second(from_unixtime(datetimeOrgination+callDuration)) as S,
count(id) as CNT 
from DB1.Data  force index (datetimeOrgination) where 1=1 and  
(second(from_unixtime(datetimeOrgination+callDuration))>second(from_unixtime(datetimeOrgination)))   and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))    
group by hour(from_unixtime(datetimeOrgination+callDuration)),minute(from_unixtime(datetimeOrgination+callDuration)),second(from_unixtime(datetimeOrgination+callDuration)))

This query is for Step 3 from the union result of 2 previous query

(
select 
count(id) as CNT
from DB1.Data force index (datetimeOrgination)  where 1=1 and 
(datetimeOrgination<UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))  and (datetimeOrgination+callDuration)>UNIX_TIMESTAMP(concat('2018-02-09',' ',T1.H,':',T1.M,':',T1.S))) 
  and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') and UNIX_TIMESTAMP('2018-02-09 23:59:59'))   
) as ADCNT 

This query gathers all of them and returns the final result.

select sql_calc_found_rows H,M,S,(TCNT+ADCNT) as CNT from
(

As I mentioned before, that query working but very slow and complex, I know needs optimization and simplification.


Field types

`datetimeOrgination` BIGINT(20) NOT NULL DEFAULT
`callDuration` BIGINT(20) NOT NULL DEFAULT '0',

and indexs

INDEX `datetimeOrgination` (`datetimeOrgination`),
INDEX `callDuration` (`callDuration`),

Caveat: Some of my suggestions are for clarity or simplification, not necessarily for speed.

Potential bug: and (second(from_unixtime(datetimeOrgination+callDuration)) > second(from_unixtime(datetimeOrgination))) does not make much sense. It will catch a 2-second call that started at 11:22:00, but not one that started at 11:21:59. Is that really what you wanted? In any case, please explain what the query is trying to do.

Don't work with H,M,S, work with just seconds -- either by extracting the hh:mm:ss string from the date, or by getting the time of day in seconds. Convert to H,M,S as the last step, not the first .

Don't FORCE INDEX -- it may help today, but hurt tomorrow.

Change and (DB1.Data.datetimeOrgination between UNIX_TIMESTAMP('2018-02-09 00:00:00') AND UNIX_TIMESTAMP('2018-02-09 23:59:59')) to

  AND  DB1.Data.datetimeOrgination >= '2018-02-00'
  AND  DB1.Data.datetimeOrgination  < '2018-02-00' + INTERVAL 1 DAY

(Again, that is for clarity, not speed.)

Use COUNT(*) instead of COUNT(id)

I'm doing a lot of guessing; help us out by providing SHOW CREATE TABLE . It smells like you are using the wrong datatype for datetimeOrgination .

After converting to seconds (from H,M,S), this

 datetimeOrgination < UNIX_TIMESTAMP(concat('2018-02-09',' ',',T1.H,':',T1.M,':',T1.S)

becomes something like

 datetimeOrgination < '2018-02-09' + INTERVAL secs SECOND

Even better would be to extract the datetime from the subquery and move to something like

 datetimeOrgination < datetime_from_subquery

This may give a better chance of using the index.

Cleanup the code and explain the goal; I'll try to come up with some more speedups.

(Since the definition of the problem is moving, I am starting a new Answer.)

The number of calls (of all types) at a specific point in time is simply:

SELECT COUNT(*) FROM tbl
    WHERE call_start            <= '2018-02-14 15:11:35'
    WHERE call_start + duration >= '2018-02-14 15:11:35';

But, I will quibble that the answer is "high" because it does not take into account what part of the given second the call started or ended. So, I think this is closer to correct:

SELECT COUNT(*) FROM tbl
    WHERE call_start            <  '2018-02-14 15:11:35'
    WHERE call_start + duration >= '2018-02-14 15:11:35';

This should come as close as possible to saying how many concurrent calls happened at exactly '2018-02-14 15:11:35.000000'; it is an approximation of the number for '2018-02-14 15:11:35.5'.

By changing COUNT(*) to SUM(...) (as already discussed), you can get the count for a given type of call.

Then you add GROUP BY using datetime or timestamp arithmetic to finish out the task.

One day

To catch all calls that started during a single day:

WHERE call_start >= '2018-02-09'
  AND call_start  < '2018-02-09' + INTERVAL 1 DAY

Problem Definition is wrong

For calculate the concurrent calls per seconds, we should count 3 type of calls per second...

I contend that that is mathematically wrong.

"Concurrent calls" is at an instant, not across a whole second (or hour or day). It means "how many phone connections are in use at that instant.

Let me change the statement of the problem to "concurrent calls per hour". Does that make sense? You can ask about "calls per hour", which could be interpreted as "calls initiated per hour" and be computed via datetimeOrgination and a GROUP BY .

Suppose I calls at the start of each minute, and each lasted 59 seconds. A single phone line could handle that. I suggest that is "1 concurrent call".

In contrast, what if I had 60 people all starting their 59-second calls at noon. That would take 60 phone lines. That would be 60 concurrent calls during the busy time of the day.

The metric you have involves a datetimeOrgination that is truncated (or rounded?) to a 1-second boundary.

Not let me modify the example to better explain why your 3 steps are wrong. I want to group by hour, and I am willing to measure the number of calls at the top of the hour. In particular, let's look at the 10 o'clock hour.

  • 09:55 - 10:05 -- a 10-minute call that is counted, by your algorithm in each of 09 and 10 hours.
  • 10:20 - 10:30 -- a 10-minute call that is counted, by your algorithm in only the 10 hour.

Why should a 10-minute call be counted as belong to two hours? This inflates the "concurrency" count.

  • 09:05 - 10:55 -- a 110-minute call that is also counted in each of 09 and 10 hours.
  • 09:30 - 11:30 -- a 110-minute call that is also counted 3 hours. Again, over-counting.

So, I contend that the only reasonable computation is to

Step 1-Count all calls started at 10:51:20 -- counted as happening at the :20 instant.

Step 2-Count all calls ended at before 10:51:20, but not started in the same second(20). -- not counted for :20.

Step 3-Count all calls started before 10:51:20 and ended after 10:51:20. -- counted for the :20 instant.

My suggested solution achieves that modification, and is both simpler and mathematically 'correct'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM