[英]MYSQL query performs very slow
I have developed a user bulk upload module. 我已经开发了一个用户批量上传模块。 There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records.
有两种情况,当数据库具有零记录时,我批量上传了20000条记录。 Its taking about 5 hours.
大约需要5个小时。 But when the database already has about 30 000 records the upload is very very slow.
但是,当数据库中已经有大约30 000条记录时,上载非常慢。 It takes about 11 hours to upload 20 000 records.
上载2万条记录大约需要11个小时。 I am just reading a CSV file via
fgetcsv
method. 我只是通过
fgetcsv
方法读取CSV文件。
if (($handle = fopen($filePath, "r")) !== FALSE) {
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
if (count($peopleData) == $fieldsCount) {
//inside i check if user already exist (firstName & lastName & DOB)
//if not, i check if email exist. if exist, update the records.
//other wise insert a new record.
}}}
Below are the queries that run. 下面是运行的查询。 (I am using Yii framework)
(我正在使用Yii框架)
SELECT *
FROM `AdvanceBulkInsert` `t`
WHERE renameSource='24851_bulk_people_2016-02-25_LE CARVALHO 1.zip.csv'
LIMIT 1
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1
If exist update the user: 如果存在,请更新用户:
UPDATE `User` SET `id`='51394', `address1`='49 GRANDE RUE',
`mobile`='', `name`=NULL, `firstName`='Franck',
`lastName`='ALLEGAERT ', `username`=NULL,
`password`=NULL, `email`=NULL, `gender`=0,
`zip`='60310', `countryCode`='DZ',
`joinedDate`='2016-02-23 10:44:18',
`signUpDate`='0000-00-00 00:00:00',
`supporterDate`='2016-02-25 13:26:37', `userType`=3,
`signup`=0, `isSysUser`=0, `dateOfBirth`='1971-07-29',
`reqruiteCount`=0, `keywords`='70,71,72,73,74,75',
`delStatus`=0, `city`='AMY', `isUnsubEmail`=0,
`isManual`=1, `isSignupConfirmed`=0, `profImage`=NULL,
`totalDonations`=NULL, `isMcContact`=NULL,
`emailStatus`=NULL, `notes`=NULL,
`addressInvalidatedAt`=NULL,
`createdAt`='2016-02-23 10:44:18',
`updatedAt`='2016-02-25 13:26:37', `longLat`=NULL
WHERE `User`.`id`='51394'
If user don't exist, insert new record. 如果用户不存在,请插入新记录。
Table engine type is MYISAM. 表引擎类型为MYISAM。 Only the email column has a index.
仅电子邮件列具有索引。
How can I optimize this to reduce the processing time? 我该如何优化以减少处理时间?
Query 2, took 0.4701 seconds which means for 30 000 records it will take 14103 sec, which is about 235 minutes. 查询2花费了0.4701秒,这意味着30 000条记录将花费14103秒,大约235分钟。 approx 6 hours.
大约6个小时。
Update 更新资料
CREATE TABLE IF NOT EXISTS `User` (
`id` bigint(20) NOT NULL,
`address1` text COLLATE utf8_unicode_ci,
`mobile` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`firstName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`username` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`password` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`gender` tinyint(2) NOT NULL DEFAULT '0' COMMENT '1 - female, 2-male, 0 - unknown',
`zip` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`countryCode` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
`joinedDate` datetime DEFAULT NULL,
`signUpDate` datetime NOT NULL COMMENT 'User signed up date',
`supporterDate` datetime NOT NULL COMMENT 'Date which user get supporter',
`userType` tinyint(2) NOT NULL,
`signup` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'whether user followed signup process 1 - signup, 0 - not signup',
`isSysUser` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 - system user, 0 - not a system user',
`dateOfBirth` date DEFAULT NULL COMMENT 'User date of birth',
`reqruiteCount` int(11) DEFAULT '0' COMMENT 'User count that he has reqruited',
`keywords` text COLLATE utf8_unicode_ci COMMENT 'Kewords',
`delStatus` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0 - active, 1 - deleted',
`city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`isUnsubEmail` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Unsubscribed form email',
`isManual` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Manualy add',
`longLat` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Longitude and Latitude',
`isSignupConfirmed` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether user has confirmed signup ',
`profImage` tinytext COLLATE utf8_unicode_ci COMMENT 'Profile image name or URL',
`totalDonations` float DEFAULT NULL COMMENT 'Total donations made by the user',
`isMcContact` tinyint(1) DEFAULT NULL COMMENT '1 - Mailchimp contact',
`emailStatus` tinyint(2) DEFAULT NULL COMMENT '1-bounced, 2-blocked',
`notes` text COLLATE utf8_unicode_ci,
`addressInvalidatedAt` datetime DEFAULT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `AdvanceBulkInsert` (
`id` int(11) NOT NULL,
`source` varchar(256) NOT NULL,
`renameSource` varchar(256) DEFAULT NULL,
`countryCode` varchar(3) NOT NULL,
`userType` tinyint(2) NOT NULL,
`size` varchar(128) NOT NULL,
`errors` varchar(512) NOT NULL,
`status` char(1) NOT NULL COMMENT '1:Queued, 2:In Progress, 3:Error, 4:Finished, 5:Cancel',
`createdAt` datetime NOT NULL,
`createdBy` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `CustomField` (
`id` int(11) NOT NULL,
`customTypeId` int(11) NOT NULL,
`fieldName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`relatedTable` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`defaultValue` text COLLATE utf8_unicode_ci,
`sortOrder` int(11) NOT NULL DEFAULT '0',
`enabled` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listItemTag` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
`required` char(1) COLLATE utf8_unicode_ci DEFAULT '0',
`onCreate` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onEdit` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onView` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listValues` text COLLATE utf8_unicode_ci,
`label` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`htmlOptions` text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomFieldSubArea` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`subarea` varchar(256) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomValue` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`relatedId` int(11) NOT NULL,
`fieldValue` text COLLATE utf8_unicode_ci,
`createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM AUTO_INCREMENT=86866 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Entire PHP Code is here http://pastie.org/10737962 整个PHP代码在这里http://pastie.org/10737962
Update 2 更新2
Explain output of the Query 解释查询的输出
Indexes are your friend. 索引是您的朋友。
UPDATE User ... WHERE id = ...
-- Desperately needs an index on ID, probably PRIMARY KEY
. UPDATE User ... WHERE id = ...
需要ID的索引,可能是PRIMARY KEY
。
Similarly for renameSource
. 对于
renameSource
同样renameSource
。
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1;
Needs INDEX(firstName, lastName, dateOfBirth)
; 需要
INDEX(firstName, lastName, dateOfBirth)
; the fields can be in any order (in this case). 字段可以是任何顺序(在这种情况下)。
Look at each query to see what it needs, then add that INDEX
to the table. 查看每个查询以查看其需求,然后将该
INDEX
添加到表中。 Read my Cookbook on building indexes . 阅读我的关于建立索引的食谱 。
Try these things to increase your query performance: 尝试使用以下方法来提高查询性能:
User.id='51394'
, instead do User.id= 51394
. User.id='51394'
引号中,而是将User.id= 51394
代替。 ENGINE=MyISAM
then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB
. ENGINE=MyISAM
则无法在数据库表之间定义索引,请将数据库引擎更改为ENGINE=InnoDB
。 And create some indexing like foreign keys, full text indexing. If I understand, for all the result of SELECT * FROM AdvanceBulkInsert
... you run a request SELECT cf.*
, and for all the SELECT cf.*
, you run the SELECT * FROM User
据我了解,对于
SELECT * FROM AdvanceBulkInsert
...的所有结果,您都运行一个SELECT cf.*
请求,对于所有SELECT cf.*
,您都运行SELECT * FROM User
I think the issue is that you send way too much requests to the base. 我认为问题在于您向基地发送了太多请求。
I think you should merge all your select request in only one big request. 我认为您应该将所有选择请求合并为一个大请求。
For that: 为了那个原因:
replace the SELECT * FROM AdvanceBulkInsert
by a EXISTS IN (SELECT * FROM AdvanceBulkInsert where ...)
or a JOIN
用
EXISTS IN (SELECT * FROM AdvanceBulkInsert where ...)
或JOIN
替换SELECT * FROM AdvanceBulkInsert
replace the SELECT * FROM User
by a NOT EXISTS IN(SELECT * from User WHERE )
用
NOT EXISTS IN(SELECT * from User WHERE )
替换SELECT * FROM User
NOT EXISTS IN(SELECT * from User WHERE )
Then you call the update on all the result of the merged select. 然后,对合并选择的所有结果调用更新。
You should too time one by one your request to find which of this requests take the most time, and you should too use ANALYSE to find what part of the request take time. 您也应该一遍一遍地处理您的请求,以找出哪个请求花费的时间最多,并且您也应该使用ANALYZE来查找请求中哪一部分花费时间。
Edit: 编辑:
Now I have see your code : 现在,我看到了您的代码:
Some lead: 一些线索:
have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. 您是否为cf.customTypeId,cfv.customFieldId,cfsa.customFieldId用户索引。 dateOfBirth ,user.
dateOfBirth,用户。 firstName,user.lastName ?
firstName,user.lastName?
you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth. 如果您有使用CustomFieldSubArea的WHERE,则不需要做LEFT JOIN CustomFieldSubArea,一个简单的JOIN CustomFieldSubArea就足够了。
You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var? 您将使用relatedId = 0大量启动查询2,也许您可以将结果保存在var中?
if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . 如果不需要排序的数据,请删除“ ORDER BY cf.sortOrder,cf.label”。 Else, add index on cf.sortOrder, cf.label
否则,在cf.sortOrder,cf.label上添加索引
When you need to find out why a query takes long, you need to inspect individual parts. 当您需要找出查询耗时的原因时,您需要检查各个部分。 As you shown in the question Explain statement can help you very much.
正如您在问题中所显示的, 解释语句可以为您提供很大帮助。 Usually the most important columns are:
通常,最重要的列是:
I would have posted analytics for the 1st and 3rd query but they are both quite simple queries. 我会为第一和第三查询发布分析数据,但是它们都是非常简单的查询。 Here is the breakdown for the query that gives you troubles:
这是给您带来麻烦的查询的细分:
EXPLAIN SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
Let me explain above list. 让我解释一下上面的清单。 Bold columns totally must have an index.
粗体列必须完全具有索引。 Joining tables is expensive operation that otherwise needs to go through all rows of both tables.
连接表是昂贵的操作,否则需要遍历两个表的所有行。 If you make index on the joinable columns the DB engine will find much faster and better way to do it.
如果在可连接列上建立索引,则数据库引擎将找到更快,更好的方法。 This should be common practice for any database
对于任何数据库,这应该是惯例
The italic columns are not mandatory to have index, but if you have large amount of rows (20 000 is large amount) you should also have index on the columns that you use for searching, it might not have such impact on the processing speed but is worth the extra bit of time. 斜体列不是必须具有索引的,但是如果您有大量的行(20 000是大量的),则还应该在用于搜索的列上具有索引,这可能不会对处理速度产生影响,但是值得额外的时间。
So you need to add indicies to theese columns 因此,您需要在theese列中添加索引
To verify the results try running explain statement again after adding indicies (and possibly few other select/insert/update queries). 要验证结果,请尝试在添加索引(可能还有其他一些选择/插入/更新查询)之后再次运行explain语句。 The extra column should say something like "Using Index" and possible_keys column should list used keys (even two or more per join query).
额外的列应显示诸如“使用索引”之类的内容,而possible_keys列应列出已使用的键(每个联接查询甚至两个或更多)。
Side note: You have some typos in your code, you should fix them in case someone else needs to work on your code too: "reqruiteCount" as table column and "fileUplaod" as array index in your refered code. 旁注:您的代码中有一些错别字,如果其他人也需要对您的代码进行处理,则应予以纠正:“ reqruiteCount”作为表列,“ fileUplaod”作为引用的代码中的数组索引。
For my work, I have to add daily one CSV with 524 Columns and 10k records. 对于我的工作,我必须每天添加一个524列和10k记录的CSV。 When I have try to parse it and add the record with php, it was horrible.
当我尝试解析它并用php添加记录时,那太可怕了。
So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE 因此,我建议您查看有关LOAD DATA LOCAL INFILE的文档
I copy/past my own code for example, but adapt him to your needs 例如,我复制/粘贴了自己的代码,但根据您的需求使他适应
$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
REPLACE
INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
$result = (bool)$this->db->query($dataload);
Where $filename is a local path of your CSV (you can use dirname(__FILE__)
for get it ) 其中$ filename是CSV的本地路径(您可以使用
dirname(__FILE__)
来获取它)
This SQL command is very quick (just 1 or 2 second for add/update all the CSV) 这个SQL命令非常快(添加或更新所有CSV只需1或2秒)
EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. 编辑:阅读文档,但是您当然需要在用户表上具有uniq索引才能进行“替换”工作。 So, you don't need to check if the user exist or not.
因此,您无需检查用户是否存在。 And you don't need to parse the CSV file with php.
而且您不需要使用php解析CSV文件。
You appear to have the possibility (probability?) of 3 queries for every single record. 您似乎对每条记录都有3个查询(概率?)。 Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).
这3个查询将需要3次访问数据库的时间(如果您使用yii将记录存储在yii对象中,则可能会进一步降低速度)。
Can you add a unique key on first name / last name / DOB and one on email address? 您可以在名字/姓氏/ DOB上添加唯一键,在电子邮件地址上添加唯一键吗?
If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. 如果是这样,您只需执行INSERT .... ON DUPLICATE KEY UPDATE。 This would reduce it to a single query for each record, greatly speeding things up.
这样会将其简化为每个记录的单个查询,从而大大加快了工作速度。
But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database. 但是这种语法的最大优点是您可以一次插入/更新许多记录(我通常坚持约250条记录),因此访问数据库的次数更少。
You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. 您可以敲击一个类,该类将记录仅传递给该类,并且当记录数达到您的选择时将插入该类。 Also add in a call to insert the records in the destructor to insert any final records.
还添加一个调用以将记录插入析构函数中以插入所有最终记录。
Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. 另一个选择是将所有内容读取到临时表中,然后将其用作连接到用户表以进行更新/插入的源。 This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast.
这将需要对索引进行一些工作,但是临时表的批量加载很快,并且使用有用的索引对其进行更新将很快。 Using it as a source for the inserts should also be fast (if you exclude the records already updated).
使用它作为插入源也应该很快(如果排除已更新的记录)。
The other issue appears to be your following query, but not sure where you execute this. 另一个问题似乎是您的以下查询,但不确定在哪里执行此查询。 It appears to only need to be executed once, in which case it might not matter too much.
它似乎只需要执行一次,在这种情况下可能并不太重要。 You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index.
您尚未提供CustomType表的结构,但它已连接到Customfield,并且字段customTypeId没有索引。 Hence that join will be slow.
因此,该连接将很慢。 Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)
同样,在CustomValue和CustomFieldSubArea联接上,它们基于customFieldId联接,并且在该字段上都没有索引(希望是唯一索引,因为这些字段不是唯一的,您将获得很多记录返回-每种可能的组合为1行)
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
看到它,您可以尝试减少查询,并使用sql在线编译器检查时间段,然后将其包含在项目下。
Always do bulk importing within a transation 始终在转换中批量导入
$transaction = Yii::app()->db->beginTransaction();
$curRow = 0;
try
{
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
$curRow++;
//process $peopleData
//insert row
//best to use INSERT ... ON DUPLICATE KEY UPDATE
// a = 1
// b = 2;
if ($curRow % 5000 == 0) {
$transaction->commit();
$transaction->beginTransaction();
}
}
catch (Exception $ex)
{
$transaction->rollBack();
$result = $e->getMessage();
}
//don't forget the remainder.
$transaction->commit();
I have seen import routines sped up 500% by simply using this technique. 我已经看到通过简单地使用此技术,导入例程将加速500%。 I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row.
我还看到了一个导入过程,该过程对每一行进行了600个查询(选择,插入,更新和显示表结构的混合)。 This technique sped up the process 30%.
这项技术使该过程加快了30%。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.