简体   繁体   English

我如何加快我的cron作业/数据库更新

[英]how can I speed up my cron job / database update

I have a cron job that runs once every hour, to update a local database with hourly data from an API. 我有一个cron作业,每小时运行一次,用来自API的每小时数据更新本地数据库。

The database stores hourly data in rows, and the API returns 24 points of data, representing the past 24 hours. 数据库按行存储每小时数据,API返回24个数据点,代表过去24小时。

Sometimes a data point is missed, so when I get the data back, I cant only update the latest hour - I also need to check if I have had this data previously, and fill in any gaps where gaps are found. 有时会丢失一个数据点,所以当我取回数据时,我只能更新最近的时间-我还需要检查以前是否有此数据,并填补发现空白的任何空白。

Everything is running and working, but the cron job takes at least 30 minutes to complete every time, and I wonder if there is any way to make this run better / faster / more efficiently? 一切都在运行,但每次cron作业至少需要30分钟才能完成,我想知道是否有任何方法可以使此运行更好/更快/更有效?

My code does the following: (summary code for brevity!) 我的代码执行以下操作:(为简洁起见,摘要代码!)

// loop through the 24 data points returned
for($i=0; $i<24; $i+=1) {

// check if the data is for today, because the past 24 hours data will include data from yesterday
if ($thisDate == $todaysDate) {

// check if data for this id and this time already exists
$query1 = "SELECT reference FROM mydatabase WHERE ((id='$id') AND (hour='$thisTime'))";

// if it doesnt exist, insert it
if ($datafound==0) {
$query2 = "INSERT INTO mydatabase (id,hour,data_01) VALUES ('$id','$thisTime','$thisData')";
}

}
}

And there are 1500 different IDs, so it does this 1500 times! 并且有1500个不同的ID,因此可以执行1500次!

Is there any way I can speed up or optimise this code so it runs faster and more efficiently? 有什么办法可以加快或优化此代码,使其运行得更快,更有效?

This does not seem very complex and it should run in few seconds. 这似乎不太复杂,应该在几秒钟内运行。 So my first guess without knowing your database is that you are missing an index on your database. 因此,我不知道您的数据库的第一个猜测就是您缺少数据库的索引。 So please check if there is an index on your id field. 因此,请检查您的id字段上是否有索引。 If your id field is not your unique key you should consider adding another index on 2 fields id and hour . 如果您的id字段不是唯一键,则应考虑在2个字段idhour上添加另一个索引。 If these aren't already there this should lead to a massive time save. 如果还没有这些,那将节省大量时间。

Another idea could be to retrieve all data for the last 24 hours in a single sql query, store the values in an array and do your checks if you already read that data only on your array. 另一个想法是在单个sql查询中检索过去24小时内的所有数据,将值存储在数组中,然后检查是否仅在数组中读取了该数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM