简体   繁体   English

增强csv文件数据库导入

[英]Enhance csv file database import

I'm using the script below to import a large csv file to my database. 我正在使用以下脚本将大型csv文件导入到我的数据库中。

If the table is empty the process takes about 5 minutes to finish on a local machine. 如果表为空,则此过程大约需要5分钟才能在本地计算机上完成。

If I'm using the file to update existing values on the same table it takes more than 15 minutes to finish. 如果我正在使用文件更新同一表上的现有值,则需要15分钟以上才能完成。

My csv file contains about 35,000 rows. 我的csv文件包含大约35,000行。

How can I speed up the process? 我如何加快这一过程?

    if ( $request->get( $_POST["action"] ) == "import" ) {

        $file = $upload->file_upload( "import", "media/import" );
        if ( file_exists( DIR_UPLOAD_PHOTO . "/media/import/" . $file ) ) {

            $file   = DIR_UPLOAD_PHOTO . "/media/import/" . $file;
            $handle = fopen( $file, "r" );

            if ( $handle ) {
                $lines = explode( "\r", fread( $handle, filesize( $file ) ) );
            }

            $total_array = count( $array );

            $x = 0;

            foreach ( $lines as $line ) {

                if ( $x >= 1 ) {
                    $data = explode( "|", $line );

                    $titlu          = trim( addslashes( $data[0] ) );
                    $alias          = $this->generate_seo_link( $titlu );
                    $gramaj         = trim( $data[1] );
                    $greutate       = trim( $data[2] );
                    $pret_total     = trim( $data[3] );
                    $pret_redus     = trim( $data[4] );
                    $poza           = trim( $data[5] );
                    $pret_unitar    = trim( $data[6] );
                    $categorie      = trim( $data[7] );
                    $brand          = trim( addslashes( $data[8] ) );
                    $descriere      = trim( addslashes( $data[9] ) );
                    $vizibil        = trim( $data[10] );
                    $cod            = trim( $data[11] );
                    $nou            = trim( $data[12] );
                    $cant_variabila = trim( $data[13] );
                    $congelat       = trim( $data[14] );
                    $tva            = trim( $data[15] );
                    $stoc           = trim( $data[16] );

                    if ( $cod != "" && $cod != " " ) {

                        $verificare = $database->select( "SELECT alias FROM produse WHERE alias LIKE '%" . $alias . "%'" );
                        for ( $i = 0; $i < $database->countRows(); $i++ ) {
                            if ( $alias == $verificare['alias'][$i] ) {
                                $alias = $this->increment_string( $alias, '_', 1 );
                            } else {
                                $alias = $alias;
                            }
                        }

                        $database->insert( sprintf( "insert into produse set
                            titlu='%s',
                            alias='%s',
                            gramaj='%s',
                            greutate='%s',
                            prettotal='%s',
                            pretredus='%s',
                            poza='%s',
                            pretunitar='%s',
                            categorie='%d',
                            brand='%s',
                            descriere='%s',
                            vizibil='%d',
                            cod='%s',
                            nou='%d',
                            cant_variabila='%d',
                            congelat = '%d',
                            tva = '%s',
                            stoc = '%d'

                            on duplicate key update

                            titlu='%s',
                            gramaj='%s',
                            greutate='%s',
                            prettotal='%s',
                            pretredus='%s',
                            poza='%s',
                            pretunitar='%s',
                            categorie='%d',
                            brand='%s',
                            descriere='%s',
                            vizibil='%d',
                            cod='%s',
                            nou='%d',
                            cant_variabila='%d',
                            congelat = '%d',
                            tva='%s',
                            stoc= '%d'",

                            $titlu, $alias,
                            $gramaj, $greutate, $pret_total, $pret_redus, $poza, $pret_unitar, $categorie,
                            $brand, $descriere, $vizibil, $cod, $nou, $cant_variabila, $congelat,
                            $tva, $stoc,

                            $titlu, $gramaj, $greutate,
                            $pret_total, $pret_redus, $poza, $pret_unitar, $categorie, $brand, $descriere,
                            $vizibil, $cod, $nou, $cant_variabila, $congelat, $tva, $stoc ) );

  }
  }
  $x++;
    }

   }
  }

And here is my incrementing function 这是我的增量功能

  function increment_string($str, $separator = '-', $first = 1){
    preg_match('/(.+)'.$separator.'([0-9]+)$/', $str, $match);

    return isset($match[2]) ? $match[1].$separator.($match[2] + 1) : $str.$separator.$first;

  }

First off, the less you do - the faster it is. 首先,您做的越少-速度越快。 However, many of database-imports are slow because of hard drive. 但是,由于硬盘驱动器,许多数据库导入都很慢。 Not because of CPU, not because of insufficient RAM - it's the hard drive. 不是因为CPU,不是因为RAM不足-这是硬盘驱动器。

Here's why: a hard disk operates in terms of input output operations per second - I'll refer to it as I/O. 原因如下:硬盘以每秒输入输出操作的速度运行 -我将其称为I / O。 That's the number that manufacturers don't advertise. 那是制造商不做广告的数字。 They advertise things like bandwith and burst read, those are mostly useless numbers - something like DPI for mouses. 他们宣传诸如bandband和burst read之类的东西,这些东西大多是无用的数字,例如鼠标的DPI。

A mechanical disk has relatively low number of I/Os available. 机械磁盘的可用I / O数量相对较少。 That number varies depending on the drive, it can be anything between 100 and 400 I/Os. 该数字取决于驱动器,它可以是100到400个I / O之间的任何值。 An SSD has much higher number of I/Os available, from 5000 to 80k (and more). SSD具有更多可用的I / O,数量从5000到80k(甚至更多)。

That means that mechanical disk can perform, say, 400 writes in 1 second while an SSD can do 5000. The problem with that is that database queries are usually small in terms of data (about 4KB). 这意味着机械磁盘可以在1秒内执行400次写入,而SSD可以进行5000次写入。问题在于数据库查询通常在数据方面很小(大约4KB)。

If you do simple math - 400 I/Os * 4KB - you get the number of ~1.6 MB/sec. 如果您做简单的数学运算-400 I / O * 4KB-您得到的数字约为1.6 MB /秒。 What it indicates is that you are spending all of the I/Os but nut all the capacity of your disk's bandwith. 这表明您正在花费所有I / O,但浪费了磁盘带宽的所有容量。

That also hints that you could issue larger data writes per I/O. 这也暗示您可以为每个I / O发出更大的数据写操作。 In mortal language, it simply means that you should start a transaction, issue several INSERT queries (say, 50 INSERTs) and then commit the transaction. 用凡人语言,它只是意味着您应该启动一个事务,发出多个INSERT查询(例如50个INSERT),然后提交该事务。

That way you spent 1 I/O for 50 inserts. 这样,您花费了1个I / O进行了50次插入操作。 In turn, it's literally 50 times faster. 反过来,它实际上快了50倍。 If you were to use prepared statements, this becomes even more efficient because MySQL doesn't have to lex the query every time you send it. 如果使用准备好的语句,这将变得更加高效,因为MySQL不必在每次发送查询时都对查询进行lex处理。

I won't send any code because you should be able to fix it on your own. 我不会发送任何代码,因为您应该能够自行修复它。 Also, your code is open for SQL injections. 另外,您的代码也可以进行SQL注入。 You have a few things to modify, and if you are not sure what prepared statements are - shout back. 您需要修改一些内容,如果不确定什么是准备好的语句,请大声喊叫。

将SELECT推入INSERT,以便它们都在服务器上运行,而不是从客户端到服务器来回移动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM