简体   繁体   中英

Most efficient way to execute a large SQL query using PHP?

Every three months i am required to upload a CSV file which contains around 400,000 products and insert them into a MySQL Database. I don't feel my method is very efficient and would like some suggestions.

Currently i parse the CSV file like so:

public function parse_csv_to_array() {

    // Initialize empty array 
    $array = $fields = array(); 

    $interval = 0;

    // File Handle
    $handle = @fopen($this->csvFile, "r");

    if ($handle) {

        while (($row = fgetcsv($handle, 4096)) !== false) {

            if (empty($fields)) {
                $fields = $row;
                continue;
            }

            foreach ($row as $k=>$value) {
                $array[$interval][$fields[$k]] = $value;
            }

            $interval++;
        }

        if (!feof($handle)) {
            echo "Error: unexpected fgets() fail\n";
        }

        fclose($handle);
    }

    return $array;
}

I then simply loop through the array inserting a new or replacing an existing record if it is already present. This means i am executing at least 1.2 million SQL queries to first check if the record is present and then insert/replace the record into the database.

Currently this is done as a HTML5 form upload and executes in the users browser once they click submit. The whole process can take up to 30 minutes which i don't think is bad, but i have had to set the timeout of the PHP script to unlimited to allow the script to run. I don't feel this is very efficient and increases the load considerably on the server. I was wondering if there are methods of segmenting the array and uploading the records in partitions or should i be using schedulers such as CRON. The idea of just executing 1.2 million SQL queries in one script feels dirty and there has to be a better way. Any suggestions would be welcome.

You can do one query to bring back all the records, store the records in an array, compare the data in the csv with the values in the array, and update when necessary. You can also create an array with only the values that needs to be updated and then do a bulk insert.

In this method, you are not making as many requests to the database, hence, it should be less resource intensive.

I think using chunks and a cron would be the best solution. Run your cron every few minutes looking for new data and uploading it to database if given. Then it can run in background.

To speed up your script itself you could also chunk your entries and diff and insert as bulk. Then you have not to do so much sql-statements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM