简体   繁体   English

如何加快Mysql和PHP的速度?

[英]how to speed up Mysql and PHP?

I am developing a script in my localhst using PHP and mysql and I am dealing with large data (about 2 millions of records for scintific research) 我正在使用PHP和mysql在我的localhst中开发一个脚本,我正在处理大数据(大约有2百万条用于科学研究的记录)

some queries I need to call once in a life (to analyse the data and prepare some data); 我需要在生活中调用一些查询(分析数据并准备一些数据); however it takes very long time for example: now my script is analysing some data for more than 4 hours 但是例如需要很长时间:现在我的脚本正在分析一些数据超过4个小时

I knew I might have some problems in the optimization of my database I am not an expert 我知道我的数据库优化可能会有一些问题我不是专家

for example I just figured out that "indexing" can be useful to speed up the queries however even with indexing some columns my script is still very slow 例如,我只是发现“索引”对于加速查询很有用,但即使索引某些列我的脚本仍然很慢

any idea how to speed up my script (in PHP and mysql) 任何想法如何加快我的脚本(在PHP和MySQL中)

I am using XAMPP as a server package 我使用XAMPP作为服务器包

Thanks a lot for help 非常感谢您的帮助

best regards 最好的祝福

update 1: 更新1:

part of my slow script which takes more than 4 hours to process 我的慢速脚本的一部分需要4个多小时来处理

$sql = "select * from urls";//10,000 record of cached HTML documents
$result = $DB->query($sql);
while($row = $DB->fetch_array($result)){
$url_id = $row["id"];
$content = $row["content"];

$dom = new DOMDocument();
@$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$row = $xpath->evaluate("/html/body//a");

for($i = 0; $i < $row->length; $i++) {
     // lots of the code here to deal with the HTML documents and some update and insert and select queries which query another table which has 1 million record
}

update 2: 更新2:

I do not have "JOIN" in my quires or even "IN" 我的quires中没有“JOIN”甚至“IN”

they are very simple queries 它们是非常简单的查询

and don't know! 而且不知道! and I don't know how to know which causes the slowness? 我不知道怎么知道导致缓慢的原因是什么?

is it the PHP or the MYSQL? 是PHP还是MYSQL?

First of all, to be able to optimize efficiently, you need to know what it taking time : 首先,为了能够有效地进行优化,您需要知道花费时间:

  • is PHP doing too much calculations ? PHP做了太多的计算?
  • do you have too many SQL queries ? 你有太多的SQL查询吗?
  • do you have SQL queries that take too much time ? 你有SQL查询需要花费太多时间吗?
    • if yes, which ones ? 如果是的话,哪些?
  • where is your script spending time ? 你的脚本在哪里花费时间?

With those informations, you can then try to figure out : 有了这些信息,您可以尝试弄清楚:

  • if you can diminish the number of SQL queries 如果你可以减少SQL查询的数量
    • for instance, if you are doing the exact same query over and over again, you are obviously wasting time 例如,如果你一遍又一遍地做同样的查询,你显然是在浪费时间
    • another idea is to "regroup" queries, if that is possible ; 另一个想法是“重新组合”查询,如果可能的话; for instance, use only one query to get 10 lines, instead of 10 queries which all get one line back. 例如,只使用一个查询来获得10行,而不是10个查询都获得一行。
  • if you can optimize queries that take too long 如果您可以优化花费太长时间的查询
    • either using indexes -- those which are usefull generally depend on the joins and conditions you are using 使用索引 - 那些有用的索引通常取决于您使用的连接和条件
    • or re-writing the queries, if they are "bad" 或重写问题,如果他们是“坏”
    • About optimization of select statements, you can take a look at : 7.2. 关于select语句的优化,你可以看一下: 7.2。 Optimizing SELECT and Other Statements 优化SELECT和其他语句
  • if PHP is doing too much calculations, can you have it make less calculations ? 如果PHP进行了太多计算,你可以减少计算吗?
    • Maybe not recalculating similar stuff again and again ? 也许不会一次又一次地重新计算类似的东西?
    • Or using more efficient queries ? 或者使用更有效的查询?
  • if PHP is taking time, and the SQL server is not over-loaded, using parallelism (launching several calculations at the same time) might also help speed up the whole thing. 如果PHP花费时间,并且SQL服务器没有过载,使用并行性(同时启动多个计算)也可能有助于加快整个过程。

Still : this is quite a specific question, and the answers will be probably be pretty specific too -- which means more informations might be necessary if you want more than general answer... 仍然:这是一个非常具体的问题,答案可能也非常具体 - 这意味着如果你想要的不仅仅是一般性答案,那么可能需要更多的信息.​​.....


Edit after your edits 编辑后进行编辑

As you only have simple queries, things might be a bit easier... Maybe. 因为你只有简单的查询,事情可能会更容易......也许吧。

  • First of all : you need to identify the kind of queries you are doing. 首先:您需要确定您正在进行的查询类型。
    • I'm guessing, of all all your queries, you can identify some "types" of queries. 我猜,在你所有的查询中,你可以识别一些“类型”的查询。
    • for instance : " select * from a where x = 12 " and " select * from a where x = 14 " are of the same type : same select, same table, same where clause -- only the value changes 例如:“ select * from a where x = 12 ”和“ select * from a where x = 14 ”属于同一类型:same select,same table,same where clause - 只有值更改
  • once you know which queries are used the most, you'll need to check if they are optimized : using EXPLAIN will help 一旦您知道哪些查询被最多使用,您就需要检查它们是否已经过优化:使用EXPLAIN会有所帮助
    • (if needed, I'm sure some people will be able to help you understand its output, if you provider it alongside the schema of you DB (tables + indexes)) (如果需要,我确信有些人能够帮助你理解它的输出,如果你提供它与你的数据库(表+索引)一起提供)
    • If needed : create the right indexes -- that's kind of the hard/specific part ^^ 如果需要:创建正确的索引 - 这是一种硬/特定部分^^
    • it is also for those queries that reducing the number of queries might prove useful... 对于那些减少查询数量的查询也可能有用......
  • when you're done with queries often used, it's time to go with queries that take too long ; 当你完成经常使用的查询时,是时候进行花费太长时间的查询了; using microtime from PHP will help you find out which ones those are 使用PHP中的microtime将帮助您找出那些是什么
    • another solution is to use the 5.2.4. 另一个解决方案是使用5.2.4。 The Slow Query Log 慢查询日志
    • when you have identified those queries, same as before : optimize. 当您识别出这些查询时,与之前相同:优化。


Before that, to find out if PHP is working too much, or if it's MySQL, a simple way is to use the "top" command on Linux, or the "process manager" (I'm not on windows, and don't use it in english -- the real name might be something else) . 在此之前,要了解PHP是否工作太多,或者它是否是MySQL,一种简单的方法是在Linux上使用“top”命令,或者“进程管理器” (我不在Windows上,而不是在英语中使用它 - 真实姓名可能是其他的东西)

If PHP is eating 100% of CPU, you have your culprit. 如果PHP正在吃100%的CPU,那么你就有罪魁祸首。 If MySQL is eating all CPU, you have your culprit too. 如果MySQL正在吃掉所有的CPU,你也有罪魁祸首。

When you know which one of those is working too much, it's a first step : you know what to optimize first. 当你知道哪一个工作太多时,这是第一步:你先知道要优化什么。


I see from your portion of code that your are : 我从你的部分代码中看到你是:

  • going through 10,000 elements one by one -- it should be easy to split those in 2 or more slices 逐个浏览10,000个元素 - 应该很容易将它们分成2个或更多个切片
  • using DOM and XPath, which might eat some CPU on the PHP-side 使用DOM和XPath,这可能会占用PHP端的一些CPU

If you have a multi-core CPU, an idea (that I would try if I see that PHP is eating lots of CPU) would to to parallelize. 如果你有一个多核CPU,一个想法(我会尝试,如果我看到PHP正在吃大量的CPU)将并行化。

For instance, you could have two instances of the PHP script running at the same time : 例如,您可以同时运行两个PHP脚本实例:

  • one that will deal with the first half of the URLs 一个将处理前半部分的URL
    • the SQL query for this one will be like " select * from urls where id < 5000 " 这个的SQL查询就像“ select * from urls where id < 5000
  • and the other one that will deal with the second half of the URLs 另一个将处理URL的后半部分
    • its query will be like " select * from urls where id >= 5000 " 它的查询类似于“ select * from urls where id >= 5000

You will get a bit more concurrency on the network (probably not a problem) and on the database (a database knows how to deal with concurrency, and 2 scripts using it will generally not be too much) , but you'll be able to process almost twice the same amount of documents in the same time. 您将在网络上获得更多并发(可能不是问题)和数据库(数据库知道如何处理并发,并且使用它的2个脚本通常不会太多) ,但您将能够在同一时间处理几乎两倍于相同数量的文档。

If you have 4 CPU, splitting the urls-list in 4 (or even more ; find out by trial and error) parts would do too. 如果你有4个CPU,将urls-list分成4个(甚至更多;通过反复试验找出)部分也会这样做。

Since your query is on one table and has no grouping or ordering, it is unlikely that the query is slow. 由于您的查询位于一个表上且没有分组或排序,因此查询不太可能很慢。 I expect the issue is the size and number of the content fields. 我希望问题是内容字段的大小和数量。 It appears that you are storing the entire HTML of a webpage in your database and then pulling it out every time you want to change a couple of values on the page. 您似乎将整个网页的HTML存储在数据库中,然后每次要更改页面上的几个值时将其拉出。 This is a situation to be avoided if at all possible. 如果可能的话,这是应该避免的情况。

Most scientific webapps (like BLAST for example) have the option to export the data as a delimited text file like a csv. 大多数科学网络应用程序(例如BLAST)都可以选择将数据导出为分隔文本文件,如csv。 If this is the case for you, you might consider restructuring your url table so that you have one column per data field in the csv. 如果是这种情况,您可以考虑重构您的url表,以便在csv中每个数据字段有一列。 Then your update queries will be significantly faster as you will be able to do them entirely in SQL instead of pulling the entire url table into PHP, accessing and pulling one or more other records for each url record and then updating your table. 然后您的更新查询将显着加快,因为您将能够完全在SQL中执行它们,而不是将整个url表拉入PHP,访问和拉取每个URL记录的一个或多个其他记录,然后更新表。


Assumably you have stored your data as webpages so you can dump the content easily to a browser. 可能您已将数据存储为网页,因此您可以轻松地将内容转储到浏览器。 If you change your database schema as I've suggested, you'll need to write a webpage template that you can plug the data into when you wish to output it. 如果您按照我的建议更改数据库架构,则需要编写一个网页模板,您可以在希望输出数据时将其插入。

Knowing queries and table structures it would be easier. 了解查询和表结构会更容易。

If you cant give them out check if you have IN operator. 如果你不能给它们检查你是否有IN操作员。 MySQL tends to slow too much in there. MySQL在那里往往会放慢太多。 Also try to run 也试着跑

EXPLAIN yourquery;

and see how it is executed. 并看看它是如何执行的。 Sometimes sorting takes too much time. 有时排序需要花费太多时间。 Try to avoid sorting on non-index columns. 尽量避免在非索引列上进行排序。

inner joins are quicker than left or right joins 内连接比左连接或右连接更快

Has always sped up my queries going through after and thinking specifically about the joins. 总是加快我的查询,并专门考虑连接。

have a look in your mysql config for settings you can turn off etc 看看你的mysql配置中可以关闭的设置等

If you are not using indexes it can be the main problem. 如果您不使用索引,则可能是主要问题。 There are many more optimization hints and tricks. 还有更多优化提示和技巧。 Better will be to show ie your slowest query. 更好的是显示你最慢的查询。 It's not possible to help without any input data. 没有任何输入数据就无法提供帮助。 Indexes and correct joins can speed this up really much. 索引和正确的连接可以加快速度。

If the queries will return same data you can store them in file or in memory and do them just once. 如果查询将返回相同的数据,您可以将它们存储在文件或内存中,并只执行一次。

2 millions of records is not much. 2百万条记录并不多。

Before you can optimise, you need to find out where the bottleneck is. 在优化之前,您需要找出瓶颈所在。 Can you run the script on a smaller dataset, for testing purposes? 您是否可以在较小的数据集上运行脚本以进行测试?

In that case, you should set such a test up, and then profile the code. 在这种情况下,您应该设置这样的测试,然后分析代码。 You can either use a dedicated profiler such as Xdebug , or if you find it too daunting to configure (Not that complicated really, but you sound like you're a bit in the deep end already), you may feel more comfortable with a manual approach. 你可以使用像Xdebug这样的专用探查器,或者如果你觉得配置太令人生畏(真的没那么复杂,但你听起来已经有点深入了),你可能会对手册感觉更舒服做法。 This means starting a timer before parts of your code and stopping it after, then printing the result out. 这意味着在部分代码之前启动计时器并在之后停止,然后打印结果。 You can then narrow down which part is slowest. 然后,您可以缩小哪个部分最慢。

Once you got that, we can give more specific answers, or perhaps it will be apparent to you what to do. 一旦你得到了,我们可以提供更具体的答案,或者你可能会明白该做什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM