简体   繁体   中英

MYSQL query performs very slow

I have developed a user bulk upload module. There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records. Its taking about 5 hours. But when the database already has about 30 000 records the upload is very very slow. It takes about 11 hours to upload 20 000 records. I am just reading a CSV file via fgetcsv method.

if (($handle = fopen($filePath, "r")) !== FALSE) {
            while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
                if (count($peopleData) == $fieldsCount) {

//inside i check if user already exist (firstName & lastName & DOB)
//if not, i check if email exist. if exist, update the records.
//other wise insert a new record.
}}}

Below are the queries that run. (I am using Yii framework)

SELECT * 
FROM `AdvanceBulkInsert` `t` 
WHERE renameSource='24851_bulk_people_2016-02-25_LE CARVALHO 1.zip.csv' 
LIMIT 1

SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, 
       cfv.relatedId, cfv.fieldValue, cfv.createdAt 
FROM `CustomField` `cf` 
    INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId 
    LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId 
                and relatedId = 0 
    LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id 
WHERE ((relatedTable = 'people' and enabled = '1') 
  AND (onCreate = '1')) 
  AND (cfsa.subarea='peoplebulkinsert') 
ORDER BY cf.sortOrder, cf.label

SELECT * 
FROM `User` `t` 
WHERE `t`.`firstName`='Franck' 
  AND `t`.`lastName`='ALLEGAERT ' 
  AND `t`.`dateOfBirth`='1971-07-29' 
  AND (userType NOT IN ("1")) 
LIMIT 1

If exist update the user:

UPDATE `User` SET `id`='51394', `address1`='49 GRANDE RUE', 
                  `mobile`='', `name`=NULL, `firstName`='Franck', 
                  `lastName`='ALLEGAERT ', `username`=NULL, 
                  `password`=NULL, `email`=NULL, `gender`=0, 
                  `zip`='60310', `countryCode`='DZ', 
                  `joinedDate`='2016-02-23 10:44:18', 
                  `signUpDate`='0000-00-00 00:00:00', 
                  `supporterDate`='2016-02-25 13:26:37', `userType`=3, 
                  `signup`=0, `isSysUser`=0, `dateOfBirth`='1971-07-29', 
                  `reqruiteCount`=0, `keywords`='70,71,72,73,74,75', 
                  `delStatus`=0, `city`='AMY', `isUnsubEmail`=0, 
                  `isManual`=1, `isSignupConfirmed`=0, `profImage`=NULL, 
                  `totalDonations`=NULL, `isMcContact`=NULL, 
                  `emailStatus`=NULL, `notes`=NULL, 
                  `addressInvalidatedAt`=NULL, 
                  `createdAt`='2016-02-23 10:44:18', 
                  `updatedAt`='2016-02-25 13:26:37', `longLat`=NULL 
WHERE `User`.`id`='51394'

If user don't exist, insert new record.

Table engine type is MYISAM. Only the email column has a index.

How can I optimize this to reduce the processing time?

Query 2, took 0.4701 seconds which means for 30 000 records it will take 14103 sec, which is about 235 minutes. approx 6 hours.

Update

CREATE TABLE IF NOT EXISTS `User` (
  `id` bigint(20) NOT NULL,
  `address1` text COLLATE utf8_unicode_ci,
  `mobile` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
  `name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
  `firstName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `lastName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `username` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `password` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
  `email` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
  `gender` tinyint(2) NOT NULL DEFAULT '0' COMMENT '1 - female, 2-male, 0 - unknown',
  `zip` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
  `countryCode` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  `joinedDate` datetime DEFAULT NULL,
  `signUpDate` datetime NOT NULL COMMENT 'User signed up date',
  `supporterDate` datetime NOT NULL COMMENT 'Date which user get supporter',
  `userType` tinyint(2) NOT NULL,
  `signup` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'whether user followed signup process 1 - signup, 0 - not signup',
  `isSysUser` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 - system user, 0 - not a system user',
  `dateOfBirth` date DEFAULT NULL COMMENT 'User date of birth',
  `reqruiteCount` int(11) DEFAULT '0' COMMENT 'User count that he has reqruited',
  `keywords` text COLLATE utf8_unicode_ci COMMENT 'Kewords',
  `delStatus` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0 - active, 1 - deleted',
  `city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `isUnsubEmail` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Unsubscribed form email',
  `isManual` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Manualy add',
  `longLat` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Longitude and Latitude',
  `isSignupConfirmed` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether user has confirmed signup ',
  `profImage` tinytext COLLATE utf8_unicode_ci COMMENT 'Profile image name or URL',
  `totalDonations` float DEFAULT NULL COMMENT 'Total donations made by the user',
  `isMcContact` tinyint(1) DEFAULT NULL COMMENT '1 - Mailchimp contact',
  `emailStatus` tinyint(2) DEFAULT NULL COMMENT '1-bounced, 2-blocked',
  `notes` text COLLATE utf8_unicode_ci,
  `addressInvalidatedAt` datetime DEFAULT NULL,
  `createdAt` datetime NOT NULL,
  `updatedAt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `AdvanceBulkInsert` (
  `id` int(11) NOT NULL,
  `source` varchar(256) NOT NULL,
  `renameSource` varchar(256) DEFAULT NULL,
  `countryCode` varchar(3) NOT NULL,
  `userType` tinyint(2) NOT NULL,
  `size` varchar(128) NOT NULL,
  `errors` varchar(512) NOT NULL,
  `status` char(1) NOT NULL COMMENT '1:Queued, 2:In Progress, 3:Error, 4:Finished, 5:Cancel',
  `createdAt` datetime NOT NULL,
  `createdBy` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `CustomField` (
  `id` int(11) NOT NULL,
  `customTypeId` int(11) NOT NULL,
  `fieldName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `relatedTable` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `defaultValue` text COLLATE utf8_unicode_ci,
  `sortOrder` int(11) NOT NULL DEFAULT '0',
  `enabled` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `listItemTag` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
  `required` char(1) COLLATE utf8_unicode_ci DEFAULT '0',
  `onCreate` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `onEdit` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `onView` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `listValues` text COLLATE utf8_unicode_ci,
  `label` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `htmlOptions` text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `CustomFieldSubArea` (
  `id` int(11) NOT NULL,
  `customFieldId` int(11) NOT NULL,
  `subarea` varchar(256) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `CustomValue` (
  `id` int(11) NOT NULL,
  `customFieldId` int(11) NOT NULL,
  `relatedId` int(11) NOT NULL,
  `fieldValue` text COLLATE utf8_unicode_ci,
  `createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM AUTO_INCREMENT=86866 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Entire PHP Code is here http://pastie.org/10737962

Update 2

Explain output of the Query

在此处输入图片说明

Indexes are your friend.

UPDATE User ... WHERE id = ... -- Desperately needs an index on ID, probably PRIMARY KEY .

Similarly for renameSource .

SELECT * 
FROM `User` `t` 
WHERE `t`.`firstName`='Franck' 
  AND `t`.`lastName`='ALLEGAERT ' 
  AND `t`.`dateOfBirth`='1971-07-29' 
  AND (userType NOT IN ("1")) 
LIMIT 1;

Needs INDEX(firstName, lastName, dateOfBirth) ; the fields can be in any order (in this case).

Look at each query to see what it needs, then add that INDEX to the table. Read my Cookbook on building indexes .

Try these things to increase your query performance:

  • define indexing in your database structure, and get only columns that you want.
  • Do not use * in select query.
  • And do not put ids in quotes like User.id='51394' , instead do User.id= 51394 .
  • If you are giving ids in quotes then your indexing will not work. That approach improve your query performance by 20% faster.
  • If you are using ENGINE=MyISAM then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB . And create some indexing like foreign keys, full text indexing.

If I understand, for all the result of SELECT * FROM AdvanceBulkInsert ... you run a request SELECT cf.* , and for all the SELECT cf.* , you run the SELECT * FROM User

I think the issue is that you send way too much requests to the base.

I think you should merge all your select request in only one big request.

For that:

Then you call the update on all the result of the merged select.

You should too time one by one your request to find which of this requests take the most time, and you should too use ANALYSE to find what part of the request take time.

Edit:

Now I have see your code :

Some lead:

  • have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. dateOfBirth ,user. firstName,user.lastName ?

  • you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth.

  • You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var?

  • if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . Else, add index on cf.sortOrder, cf.label

When you need to find out why a query takes long, you need to inspect individual parts. As you shown in the question Explain statement can help you very much. Usually the most important columns are:

  • select_type - this should always be simple query/subquery. Related subqueries give a lot of troubles. Luckily you don't use any
  • possible keys - What keys is this select going to search by
  • rows - how many candidate rows are determined by the keys/cache and other techniques. Smaller number is better
  • Extra - "using" tells you how exactly are the rows found, this is the most useful information

Query analysis

I would have posted analytics for the 1st and 3rd query but they are both quite simple queries. Here is the breakdown for the query that gives you troubles:

EXPLAIN SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, 
   cfv.relatedId, cfv.fieldValue, cfv.createdAt 
FROM `CustomField` `cf` 
    INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId 
    LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId 
                and relatedId = 0 
    LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id 
WHERE ((relatedTable = 'people' and enabled = '1') 
  AND (onCreate = '1')) 
  AND (cfsa.subarea='peoplebulkinsert') 
ORDER BY cf.sortOrder, cf.label
  • INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
  • LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId and relatedId = 0
  • LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
  • WHERE (( relatedTable = 'people' and enabled = '1') AND ( onCreate = '1')) AND ( cfsa.subarea ='peoplebulkinsert')
  • ORDER BY cf.sortOrder , cf.label

Solution

Let me explain above list. Bold columns totally must have an index. Joining tables is expensive operation that otherwise needs to go through all rows of both tables. If you make index on the joinable columns the DB engine will find much faster and better way to do it. This should be common practice for any database

The italic columns are not mandatory to have index, but if you have large amount of rows (20 000 is large amount) you should also have index on the columns that you use for searching, it might not have such impact on the processing speed but is worth the extra bit of time.

So you need to add indicies to theese columns

  • CustomType - id
  • CustomField - customTypeId, id, relatedTable, enabled, onCreate, sortOrder, label
  • CustomValue - customFieldId
  • CustomFieldSubArea - customFieldId, subarea

To verify the results try running explain statement again after adding indicies (and possibly few other select/insert/update queries). The extra column should say something like "Using Index" and possible_keys column should list used keys (even two or more per join query).

Side note: You have some typos in your code, you should fix them in case someone else needs to work on your code too: "reqruiteCount" as table column and "fileUplaod" as array index in your refered code.

For my work, I have to add daily one CSV with 524 Columns and 10k records. When I have try to parse it and add the record with php, it was horrible.

So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE

I copy/past my own code for example, but adapt him to your needs

$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
                REPLACE
                INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
                FIELDS TERMINATED BY "\t"
                IGNORE 1 LINES
            ';

$result = (bool)$this->db->query($dataload);

Where $filename is a local path of your CSV (you can use dirname(__FILE__) for get it )

This SQL command is very quick (just 1 or 2 second for add/update all the CSV)

EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. So, you don't need to check if the user exist or not. And you don't need to parse the CSV file with php.

You appear to have the possibility (probability?) of 3 queries for every single record. Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).

Can you add a unique key on first name / last name / DOB and one on email address?

If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. This would reduce it to a single query for each record, greatly speeding things up.

But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database.

You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. Also add in a call to insert the records in the destructor to insert any final records.

Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast. Using it as a source for the inserts should also be fast (if you exclude the records already updated).

The other issue appears to be your following query, but not sure where you execute this. It appears to only need to be executed once, in which case it might not matter too much. You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index. Hence that join will be slow. Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)

SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, 
       cfv.relatedId, cfv.fieldValue, cfv.createdAt 
FROM `CustomField` `cf` 
    INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId 
    LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId 
                and relatedId = 0 
    LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id 
WHERE ((relatedTable = 'people' and enabled = '1') 
  AND (onCreate = '1')) 
  AND (cfsa.subarea='peoplebulkinsert') 
ORDER BY cf.sortOrder, cf.label

看到它,您可以尝试减少查询,并使用sql在线编译器检查时间段,然后将其包含在项目下。

Always do bulk importing within a transation

        $transaction = Yii::app()->db->beginTransaction();
        $curRow = 0;
        try
        {
            while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
            $curRow++;
            //process $peopleData
            //insert row
            //best to use INSERT ... ON DUPLICATE  KEY UPDATE
            // a = 1
            // b = 2;
            if ($curRow % 5000 == 0) {
               $transaction->commit();
               $transaction->beginTransaction();
            }
        }
        catch (Exception $ex)
        {
            $transaction->rollBack();
            $result = $e->getMessage();                    
        }
        //don't forget the remainder.
        $transaction->commit();

I have seen import routines sped up 500% by simply using this technique. I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row. This technique sped up the process 30%.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM