[英]how to speed up Mysql and PHP?
I am developing a script in my localhst using PHP and mysql and I am dealing with large data (about 2 millions of records for scintific research) 我正在使用PHP和mysql在我的localhst中开发一个脚本,我正在处理大数据(大约有2百万条用于科学研究的记录)
some queries I need to call once in a life (to analyse the data and prepare some data); 我需要在生活中调用一些查询(分析数据并准备一些数据); however it takes very long time for example: now my script is analysing some data for more than 4 hours
但是例如需要很长时间:现在我的脚本正在分析一些数据超过4个小时
I knew I might have some problems in the optimization of my database I am not an expert 我知道我的数据库优化可能会有一些问题我不是专家
for example I just figured out that "indexing" can be useful to speed up the queries however even with indexing some columns my script is still very slow 例如,我只是发现“索引”对于加速查询很有用,但即使索引某些列我的脚本仍然很慢
any idea how to speed up my script (in PHP and mysql) 任何想法如何加快我的脚本(在PHP和MySQL中)
I am using XAMPP as a server package 我使用XAMPP作为服务器包
Thanks a lot for help 非常感谢您的帮助
best regards 最好的祝福
update 1: 更新1:
$sql = "select * from urls";//10,000 record of cached HTML documents
$result = $DB->query($sql);
while($row = $DB->fetch_array($result)){
$url_id = $row["id"];
$content = $row["content"];
$dom = new DOMDocument();
@$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$row = $xpath->evaluate("/html/body//a");
for($i = 0; $i < $row->length; $i++) {
// lots of the code here to deal with the HTML documents and some update and insert and select queries which query another table which has 1 million record
}
update 2: 更新2:
I do not have "JOIN" in my quires or even "IN" 我的quires中没有“JOIN”甚至“IN”
they are very simple queries 它们是非常简单的查询
and don't know! 而且不知道! and I don't know how to know which causes the slowness?
我不知道怎么知道导致缓慢的原因是什么?
is it the PHP or the MYSQL? 是PHP还是MYSQL?
First of all, to be able to optimize efficiently, you need to know what it taking time : 首先,为了能够有效地进行优化,您需要知道花费时间:
With those informations, you can then try to figure out : 有了这些信息,您可以尝试弄清楚:
Still : this is quite a specific question, and the answers will be probably be pretty specific too -- which means more informations might be necessary if you want more than general answer... 仍然:这是一个非常具体的问题,答案可能也非常具体 - 这意味着如果你想要的不仅仅是一般性答案,那么可能需要更多的信息......
Edit after your edits 编辑后进行编辑
As you only have simple queries, things might be a bit easier... Maybe. 因为你只有简单的查询,事情可能会更容易......也许吧。
select * from a where x = 12
" and " select * from a where x = 14
" are of the same type : same select, same table, same where clause -- only the value changes select * from a where x = 12
”和“ select * from a where x = 14
”属于同一类型:same select,same table,same where clause - 只有值更改 EXPLAIN
will help EXPLAIN
会有所帮助
microtime
from PHP will help you find out which ones those are microtime
将帮助您找出那些是什么
Before that, to find out if PHP is working too much, or if it's MySQL, a simple way is to use the "top" command on Linux, or the "process manager" (I'm not on windows, and don't use it in english -- the real name might be something else) . 在此之前,要了解PHP是否工作太多,或者它是否是MySQL,一种简单的方法是在Linux上使用“top”命令,或者“进程管理器” (我不在Windows上,而不是在英语中使用它 - 真实姓名可能是其他的东西) 。
If PHP is eating 100% of CPU, you have your culprit. 如果PHP正在吃100%的CPU,那么你就有罪魁祸首。 If MySQL is eating all CPU, you have your culprit too.
如果MySQL正在吃掉所有的CPU,你也有罪魁祸首。
When you know which one of those is working too much, it's a first step : you know what to optimize first. 当你知道哪一个工作太多时,这是第一步:你先知道要优化什么。
I see from your portion of code that your are : 我从你的部分代码中看到你是:
If you have a multi-core CPU, an idea (that I would try if I see that PHP is eating lots of CPU) would to to parallelize. 如果你有一个多核CPU,一个想法(我会尝试,如果我看到PHP正在吃大量的CPU)将并行化。
For instance, you could have two instances of the PHP script running at the same time : 例如,您可以同时运行两个PHP脚本实例:
select * from urls where id < 5000
" select * from urls where id < 5000
” select * from urls where id >= 5000
" select * from urls where id >= 5000
” You will get a bit more concurrency on the network (probably not a problem) and on the database (a database knows how to deal with concurrency, and 2 scripts using it will generally not be too much) , but you'll be able to process almost twice the same amount of documents in the same time. 您将在网络上获得更多并发(可能不是问题)和数据库(数据库知道如何处理并发,并且使用它的2个脚本通常不会太多) ,但您将能够在同一时间处理几乎两倍于相同数量的文档。
If you have 4 CPU, splitting the urls-list in 4 (or even more ; find out by trial and error) parts would do too. 如果你有4个CPU,将urls-list分成4个(甚至更多;通过反复试验找出)部分也会这样做。
Since your query is on one table and has no grouping or ordering, it is unlikely that the query is slow. 由于您的查询位于一个表上且没有分组或排序,因此查询不太可能很慢。 I expect the issue is the size and number of the content fields.
我希望问题是内容字段的大小和数量。 It appears that you are storing the entire HTML of a webpage in your database and then pulling it out every time you want to change a couple of values on the page.
您似乎将整个网页的HTML存储在数据库中,然后每次要更改页面上的几个值时将其拉出。 This is a situation to be avoided if at all possible.
如果可能的话,这是应该避免的情况。
Most scientific webapps (like BLAST for example) have the option to export the data as a delimited text file like a csv. 大多数科学网络应用程序(例如BLAST)都可以选择将数据导出为分隔文本文件,如csv。 If this is the case for you, you might consider restructuring your url table so that you have one column per data field in the csv.
如果是这种情况,您可以考虑重构您的url表,以便在csv中每个数据字段有一列。 Then your update queries will be significantly faster as you will be able to do them entirely in SQL instead of pulling the entire url table into PHP, accessing and pulling one or more other records for each url record and then updating your table.
然后您的更新查询将显着加快,因为您将能够完全在SQL中执行它们,而不是将整个url表拉入PHP,访问和拉取每个URL记录的一个或多个其他记录,然后更新表。
Assumably you have stored your data as webpages so you can dump the content easily to a browser. 可能您已将数据存储为网页,因此您可以轻松地将内容转储到浏览器。 If you change your database schema as I've suggested, you'll need to write a webpage template that you can plug the data into when you wish to output it.
如果您按照我的建议更改数据库架构,则需要编写一个网页模板,您可以在希望输出数据时将其插入。
Knowing queries and table structures it would be easier. 了解查询和表结构会更容易。
If you cant give them out check if you have IN operator. 如果你不能给它们检查你是否有IN操作员。 MySQL tends to slow too much in there.
MySQL在那里往往会放慢太多。 Also try to run
也试着跑
EXPLAIN yourquery;
and see how it is executed. 并看看它是如何执行的。 Sometimes sorting takes too much time.
有时排序需要花费太多时间。 Try to avoid sorting on non-index columns.
尽量避免在非索引列上进行排序。
inner joins are quicker than left or right joins 内连接比左连接或右连接更快
Has always sped up my queries going through after and thinking specifically about the joins. 总是加快我的查询,并专门考虑连接。
have a look in your mysql config for settings you can turn off etc 看看你的mysql配置中可以关闭的设置等
If you are not using indexes it can be the main problem. 如果您不使用索引,则可能是主要问题。 There are many more optimization hints and tricks.
还有更多优化提示和技巧。 Better will be to show ie your slowest query.
更好的是显示你最慢的查询。 It's not possible to help without any input data.
没有任何输入数据就无法提供帮助。 Indexes and correct joins can speed this up really much.
索引和正确的连接可以加快速度。
If the queries will return same data you can store them in file or in memory and do them just once. 如果查询将返回相同的数据,您可以将它们存储在文件或内存中,并只执行一次。
2 millions of records is not much. 2百万条记录并不多。
Before you can optimise, you need to find out where the bottleneck is. 在优化之前,您需要找出瓶颈所在。 Can you run the script on a smaller dataset, for testing purposes?
您是否可以在较小的数据集上运行脚本以进行测试?
In that case, you should set such a test up, and then profile the code. 在这种情况下,您应该设置这样的测试,然后分析代码。 You can either use a dedicated profiler such as Xdebug , or if you find it too daunting to configure (Not that complicated really, but you sound like you're a bit in the deep end already), you may feel more comfortable with a manual approach.
你可以使用像Xdebug这样的专用探查器,或者如果你觉得配置太令人生畏(真的没那么复杂,但你听起来已经有点深入了),你可能会对手册感觉更舒服做法。 This means starting a timer before parts of your code and stopping it after, then printing the result out.
这意味着在部分代码之前启动计时器并在之后停止,然后打印结果。 You can then narrow down which part is slowest.
然后,您可以缩小哪个部分最慢。
Once you got that, we can give more specific answers, or perhaps it will be apparent to you what to do. 一旦你得到了,我们可以提供更具体的答案,或者你可能会明白该做什么。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.