More efficient way to do SQL queries

Question

I've been using the below php and sql for loading schedule information and real time information for passenger trains in the UK. Essentially you have to find the relevant schedules, and then load the realtime information for each schedule which is in a different table relating to todays trains.

The query is taking a little longer than is really idea and using lots of CPU% which again isn''t ideal. I'm pretty weak when it comes to sql programming so any pointers as to what is inefficient would be great.

This is for an android app and so i've tried to all with one call over http. The prints(*) and > is for splitting the string at the other end.

Here is the code:

<?

//Connect to the database
 mysql_connect("localhost","XXXX","XXXX")
or die ("No connection could be made to the OpenRail Database");
mysql_select_db("autotrain");
//Set todays date from system and get HTTP parameters for the station,time to find trains         and todays locations table.
$date = date('Y-m-d');
$test = $_GET['station'];
$time = $_GET['time'];
$table = $_GET['table'];

//Find the tiploc associated with the station being searched.
$tiplocQuery = "SELECT tiploc_code FROM allstations WHERE c LIKE '$test';";
$tiplocResult =mysql_query($tiplocQuery);
$tiplocRow = mysql_fetch_assoc($tiplocResult);

$tiploc=$tiplocRow['tiploc_code'];
//Now find the timetabled trains for the station where there exists no departure     information. Goes back two hours to account for any late running.
$timeTableQuery = "SELECT tiplocs.tps_description AS 'C',     locations$table.public_departure, locations$table.id,schedules.stp_indicator
,schedules.train_uid
FROM locations$table, tiplocs, schedules_cache, schedules,activations
WHERE locations$table.id = schedules_cache.id
AND schedules_cache.id = schedules.id
AND schedules.id =activations.id
AND '$date'
BETWEEN schedules.date_from
AND schedules.date_to
AND locations$table.tiploc_code = '$tiploc'
AND locations$table.real_departure LIKE '0'
AND locations$table.public_departure NOT LIKE '0'
AND locations$table.public_departure >='$time'-300
AND locations$table.public_departure <='$time'+300
AND schedules.runs_th LIKE '1'
AND schedules_cache.destination = tiplocs.tiploc
ORDER BY locations$table.public_departure ASC
LIMIT 0,30;";

$timeTableResult=mysql_query($timeTableQuery);


while($timeTablerow = mysql_fetch_assoc($timeTableResult)){
$output[] = $timeTablerow;

}

//Now for each id returned in the timetable, get the locations and departure times so the app may calculate expected arrival times.
foreach ($output as $value) {
$id = $value['id'];
$realTimeQuery ="SELECT     locations$table.id,locations$table.location_order,locations$table.arrival,locations$table.public_arrival,
locations$table.real_arrival,locations$table.pass,locations$table.departure,locations$   table.public_departure,locations$table.real_departure,locations$table.location_cancelled,
tiplocs.tps_description FROM locations$table,tiplocs WHERE id =$id AND     locations$table.tiploc_code=tiplocs.tiploc;";

$realTimeResult =mysql_query($realTimeQuery);
while($row3 = mysql_fetch_assoc($realTimeResult)){
    $output3[] = $row3;
}
print json_encode($output3);
print("*");
unset($output3);
unset($id);
}


print('>');
print json_encode($output);

?>

Many Thanks Matt

Answer 1

A few things I noticed.

First, you are joining tables in the where clause, like this

from table1, table2
where table1.something - table2.something

Joining in the from clause is faster

from table1 join table2 on table1.something - table2.something

Next, I'm not a php programmer, but it looks like you are running similar queries inside a loop. If that's true, look for a way to run just one query.

Edit starts here

This is in response to gazarsgo's that I back up by claim about joins in the where clause being faster. He is right, I was wrong. This is what I did. The programming language is ColdFusion:

<cfsetting showdebugoutput="no">
<cfscript>
fromtimes = ArrayNew(1);
wheretimes = ArrayNew(1);
</cfscript>

<cfloop from="1" to="1000" index="idx">
<cfquery datasource="burns" name="fromclause" result="fromresult">
select count(distinct hscnumber)
from burns_patient p join burns_case c on p.patientid = c.patientid
</cfquery>
<cfset ArrayAppend(fromtimes, fromresult.executiontime)>

<cfquery datasource="burns" name="whereclause" result="whereresult">
select count(distinct hscnumber)
from burns_patient p, burns_case c 
where p.patientid = c.patientid
</cfquery>
<cfset ArrayAppend(wheretimes, whereresult.executiontime)>
</cfloop>
<cfdump var="#ArrayAvg(fromtimes)#" metainfo="no" label="from">
 <cfdump var="#ArrayAvg(wheretimes)#" metainfo="no" label="where">

I did ran it 5 times. The results, in milliseconds, follow.

 9.563 9.611
 9.498 9.584 
 9.625 9.548 
 9.831 9.769 
 9.792 9.813

The first number represents joining in the from clause, the second joining in the where clause. The first number is lower only 60% of the time. Had it been lower 100% percent of the time, it would have shown that joining in the from clause is faster, but that' not the case.

Answer 2

The biggest issue with your setup is this foreach loop because it is unnecessary and results in n number of round trips to the database to execute a query, fetch and analyze the results.

foreach ($output as $value) {

Rewrite the initial query to include all of the fields you will need to do your later calculations.

Something like this would work.

SELECT tl.tps_description AS 'C', lc.public_departure, lc.id, s.stp_indicator, s.train_uid,
lc.id, lc.location_order, lc.arrival, lc.public_arrival, lc.real_arrival, lc.pass, lc.departure, lc.real_departure, lc.location_cancelled
FROM locations$table lc INNER JOIN schedules_cache sc ON lc.id = sc.id
  INNER JOIN schedules s ON s.id = sc.id
  INNER JOIN activations a ON s.id = a.id
  INNER JOIN tiplocs tl ON sc.destination = tl.tiploc
WHERE '$date' BETWEEN schedules.date_from AND schedules.date_to
  AND lc.tiploc_code = '$tiploc'
  AND lc.real_departure LIKE '0'
  AND lc.public_departure NOT LIKE '0'
  AND lc.public_departure >='$time'-300
  AND lc.public_departure <='$time'+300
  AND s.runs_th LIKE '1'
ORDER BY lc.public_departure ASC
LIMIT 0,30;

Eliminating n query executions from your page load should dramatically increase response time.

Answer 3

Ignoring the problems with the code, in order to speed up your query, use the EXPLAIN command to evaluate where you need to add indexes to your query.

At a guess, you probably will want to create an index on whatever locations$table.public_departure evaluates to.

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

More efficient way to do SQL queries

Question

3 answers

solution1
0 2013-02-19 01:36:33

solution2
0 ACCPTED 2013-02-19 03:13:34

solution3
0 2013-02-19 03:26:58

More efficient way to do SQL queries

Question

3 answers

solution1 0 2013-02-19 01:36:33

solution2 0 ACCPTED 2013-02-19 03:13:34

solution3 0 2013-02-19 03:26:58

solution1
0 2013-02-19 01:36:33

solution2
0 ACCPTED 2013-02-19 03:13:34

solution3
0 2013-02-19 03:26:58