简体   繁体   中英

Enhance csv file database import

I'm using the script below to import a large csv file to my database.

If the table is empty the process takes about 5 minutes to finish on a local machine.

If I'm using the file to update existing values on the same table it takes more than 15 minutes to finish.

My csv file contains about 35,000 rows.

How can I speed up the process?

    if ( $request->get( $_POST["action"] ) == "import" ) {

        $file = $upload->file_upload( "import", "media/import" );
        if ( file_exists( DIR_UPLOAD_PHOTO . "/media/import/" . $file ) ) {

            $file   = DIR_UPLOAD_PHOTO . "/media/import/" . $file;
            $handle = fopen( $file, "r" );

            if ( $handle ) {
                $lines = explode( "\r", fread( $handle, filesize( $file ) ) );
            }

            $total_array = count( $array );

            $x = 0;

            foreach ( $lines as $line ) {

                if ( $x >= 1 ) {
                    $data = explode( "|", $line );

                    $titlu          = trim( addslashes( $data[0] ) );
                    $alias          = $this->generate_seo_link( $titlu );
                    $gramaj         = trim( $data[1] );
                    $greutate       = trim( $data[2] );
                    $pret_total     = trim( $data[3] );
                    $pret_redus     = trim( $data[4] );
                    $poza           = trim( $data[5] );
                    $pret_unitar    = trim( $data[6] );
                    $categorie      = trim( $data[7] );
                    $brand          = trim( addslashes( $data[8] ) );
                    $descriere      = trim( addslashes( $data[9] ) );
                    $vizibil        = trim( $data[10] );
                    $cod            = trim( $data[11] );
                    $nou            = trim( $data[12] );
                    $cant_variabila = trim( $data[13] );
                    $congelat       = trim( $data[14] );
                    $tva            = trim( $data[15] );
                    $stoc           = trim( $data[16] );

                    if ( $cod != "" && $cod != " " ) {

                        $verificare = $database->select( "SELECT alias FROM produse WHERE alias LIKE '%" . $alias . "%'" );
                        for ( $i = 0; $i < $database->countRows(); $i++ ) {
                            if ( $alias == $verificare['alias'][$i] ) {
                                $alias = $this->increment_string( $alias, '_', 1 );
                            } else {
                                $alias = $alias;
                            }
                        }

                        $database->insert( sprintf( "insert into produse set
                            titlu='%s',
                            alias='%s',
                            gramaj='%s',
                            greutate='%s',
                            prettotal='%s',
                            pretredus='%s',
                            poza='%s',
                            pretunitar='%s',
                            categorie='%d',
                            brand='%s',
                            descriere='%s',
                            vizibil='%d',
                            cod='%s',
                            nou='%d',
                            cant_variabila='%d',
                            congelat = '%d',
                            tva = '%s',
                            stoc = '%d'

                            on duplicate key update

                            titlu='%s',
                            gramaj='%s',
                            greutate='%s',
                            prettotal='%s',
                            pretredus='%s',
                            poza='%s',
                            pretunitar='%s',
                            categorie='%d',
                            brand='%s',
                            descriere='%s',
                            vizibil='%d',
                            cod='%s',
                            nou='%d',
                            cant_variabila='%d',
                            congelat = '%d',
                            tva='%s',
                            stoc= '%d'",

                            $titlu, $alias,
                            $gramaj, $greutate, $pret_total, $pret_redus, $poza, $pret_unitar, $categorie,
                            $brand, $descriere, $vizibil, $cod, $nou, $cant_variabila, $congelat,
                            $tva, $stoc,

                            $titlu, $gramaj, $greutate,
                            $pret_total, $pret_redus, $poza, $pret_unitar, $categorie, $brand, $descriere,
                            $vizibil, $cod, $nou, $cant_variabila, $congelat, $tva, $stoc ) );

  }
  }
  $x++;
    }

   }
  }

And here is my incrementing function

  function increment_string($str, $separator = '-', $first = 1){
    preg_match('/(.+)'.$separator.'([0-9]+)$/', $str, $match);

    return isset($match[2]) ? $match[1].$separator.($match[2] + 1) : $str.$separator.$first;

  }

First off, the less you do - the faster it is. However, many of database-imports are slow because of hard drive. Not because of CPU, not because of insufficient RAM - it's the hard drive.

Here's why: a hard disk operates in terms of input output operations per second - I'll refer to it as I/O. That's the number that manufacturers don't advertise. They advertise things like bandwith and burst read, those are mostly useless numbers - something like DPI for mouses.

A mechanical disk has relatively low number of I/Os available. That number varies depending on the drive, it can be anything between 100 and 400 I/Os. An SSD has much higher number of I/Os available, from 5000 to 80k (and more).

That means that mechanical disk can perform, say, 400 writes in 1 second while an SSD can do 5000. The problem with that is that database queries are usually small in terms of data (about 4KB).

If you do simple math - 400 I/Os * 4KB - you get the number of ~1.6 MB/sec. What it indicates is that you are spending all of the I/Os but nut all the capacity of your disk's bandwith.

That also hints that you could issue larger data writes per I/O. In mortal language, it simply means that you should start a transaction, issue several INSERT queries (say, 50 INSERTs) and then commit the transaction.

That way you spent 1 I/O for 50 inserts. In turn, it's literally 50 times faster. If you were to use prepared statements, this becomes even more efficient because MySQL doesn't have to lex the query every time you send it.

I won't send any code because you should be able to fix it on your own. Also, your code is open for SQL injections. You have a few things to modify, and if you are not sure what prepared statements are - shout back.

将SELECT推入INSERT,以便它们都在服务器上运行,而不是从客户端到服务器来回移动。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM