简体   繁体   English

如何使用 PHP 更高效地将 csv 文件导入 MySQL 数据库?

[英]How can I import csv file to MySQL database more efficiently with PHP?

I explain, I have a Symfony2 project and I need to import users via csv file in my database.我解释一下,我有一个 Symfony2 项目,我需要通过数据库中的 csv 文件导入用户。 I have to do some work on the datas before importing it in MySQL. I created a service for this and everything is working fine but it takes too much time to execute and slow my server if I give it my entire file.在将数据导入 MySQL 之前,我必须对数据做一些工作。我为此创建了一个服务,一切正常,但如果我把整个文件都交给它,执行和减慢服务器速度会花费太多时间。 My files have usually between 500 and 1500 rows and I have to split my file in ~200 rows files and import one by one.我的文件通常有 500 到 1500 行,我必须将我的文件拆分为 ~200 行文件并逐一导入。

I need to handle related users that can be both in the file and/or in database already.我需要处理已经在文件和/或数据库中的相关用户。 Related users are usually a parent of a child.相关用户通常是孩子的父母。

Here is my simplified code:这是我的简化代码:

$validator = $this->validator;

$members = array();
$children = array();
$mails = array();

$handle = fopen($filePath, 'r');
$datas = fgetcsv($handle, 0, ";");

while (($datas = fgetcsv($handle, 0, ";")) !== false) {

    $user = new User();

    //If there is a related user
    if($datas[18] != ""){
        $user->setRelatedMemberEmail($datas[18]);

        $relation = array_search(ucfirst(strtolower($datas[19])), UserComiti::$RELATIONSHIPS);
        if($relation !== false)
            $user->setParentRelationship($relation);
    }
    else {
        $user->setRelatedMemberEmail($datas[0]);
        $user->addRole ( "ROLE_MEMBER" );
    }

    $user->setEmail($mail);
    $user->setLastName($lastName);
    $user->setFirstName($firstName);
    $user->setGender($gender);
    $user->setBirthdate($birthdate);
    $user->setCity($city);
    $user->setPostalCode($zipCode);
    $user->setAddressLine1($adressLine1);
    $user->setAddressLine2($adressLine2);
    $user->setCountry($country);
    $user->setNationality($nationality);
    $user->setPhoneNumber($phone);

    //Entity Validation
    $listErrors = $validator->validate($user);

    //In case of errors
    if(count($listErrors) > 0) {
         foreach($listErrors as $error){
              $nbError++;
              $errors .= "Line " . $line . " : " . $error->getMessage() . "\n";
         }
   }

   else {
       if($mailParent != null)
            $children[] = $user;

       else{
            $members[] = $user;
            $nbAdded++;
       }
   }

   foreach($members as $user){
        $this->em->persist($user);
        $this->em->flush();
   }

   foreach($children as $child){

       //If the related user is already in DB
       $parent = $this->userRepo->findOneBy(array('username' => $child->getRelatedMemberEmail(), 'club' => $this->club));

       if ($parent !== false){

           //Check if someone related to related user already has the same last name and first name. If it is the case we can guess that this user is already created
           $testName = $this->userRepo->findByParentAndName($child->getFirstName(), $child->getLastName(), $parent, $this->club);

           if(!$testName){
                $child->setParent($parent);
                $this->em->persist($child);
                $nbAdded++;
           }
           else
                $nbSkipped++;
       }

       //Else in case the related user is neither file nor in database we create a fake one that will be able to update his profile later.
       else{

            $newParent = clone $child;
            $newParent->setUsername($child->getRelatedMemberEmail());
            $newParent->setEmail($child->getRelatedMemberEmail());
            $newParent->setFirstName('Unknown');

            $this->em->persist($newParent);
            $child->setParent($newParent);
            $this->em->persist($child);

            $nbAdded += 2;
            $this->em->flush();
        }
    }
}

It's not my whole service because I don't think the remaining would help here but if you need more information ask me.这不是我的全部服务,因为我认为剩下的不会在这里有所帮助,但如果您需要更多信息,请问我。

While I do not heave the means to quantitatively determine the bottlenecks in your program, I can suggest a couple of guidelines that will likely significantly increase its performance.虽然我没有办法定量确定程序中的瓶颈,但我可以建议一些可能会显着提高其性能的指南。

  1. Minimize the number of database commits you are making.尽量减少您进行的数据库提交次数。 A lot happens when you write to the database.写入数据库时会发生很多事情。 Is it possible to commit only once at the end?是否有可能最后只提交一次?

  2. Minimize the number of database reads you are making.尽量减少您正在进行的数据库读取次数。 Similar to the previous point, a lot happens when you read from the database.与上一点类似,当您从数据库中读取时会发生很多事情。


If after considering the above points you still have issues, determine what SQL the ORM is actually generating and executing.如果在考虑以上几点后您仍然有问题,请确定 SQL 和 ORM 实际生成和执行的是什么。 ORMs work great until efficiency becomes a problem and more care needs to go into ensuring optimal queries are being generated. ORM 工作得很好,直到效率成为问题并且需要更多关注 go 以确保生成最佳查询。 At this point, becoming more familiar with the ORM and SQL would be beneficial.在这一点上,熟悉 ORM 和 SQL 将是有益的。


You don't seem to be working with too much data, but if you were, MySQL alone supports reading CSV files.你似乎没有处理太多数据,但如果你是, MySQL 单独支持读取 CSV 文件。

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. LOAD DATA INFILE 语句以非常高的速度将文本文件中的行读入表中。 https://dev.mysql.com/doc/refman/5.7/en/load-data.html https://dev.mysql.com/doc/refman/5.7/en/load-data.html

You may be able to access this MySQL specific feature through your ORM, but if not, you would need to write some plain SQL to utilize it.您也许可以通过 ORM 访问此 MySQL 特定功能,但如果不能,则需要编写一些简单的 SQL 才能使用它。 Since you need to modify the data you are reading from the CSV, you would likely be able to do this very, very quickly by following these steps:由于您需要修改从 CSV 读取的数据,您可以按照以下步骤非常非常快速地完成此操作:

  1. Use LOAD DATA INFILE to read the CSV into a temporary table.使用 LOAD DATA INFILE 将 CSV 读入临时表。
  2. Manipulate the data in the temporary table and other tables as required.根据需要操作临时表和其他表中的数据。
  3. SELECT the data from the temporary table into your destination table. SELECT 将数据从临时表导入到你的目标表中。

I know that it is very old topic, but some time ago I created a bundle, which can help import entities from csv to database.我知道这是一个很老的话题,但前段时间我创建了一个包,它可以帮助将实体从 csv 导入数据库。 So maybe if someone will see this topic, it will be helpful for him.所以也许如果有人会看到这个话题,这会对他有所帮助。

https://github.com/jgrygierek/BatchEntityImportBundle https://github.com/jgrygierek/SonataBatchEntityImportBundle https://github.com/jgrygierek/BatchEntityImportBundle https://github.com/jgrygierek/SonataBatchEntityImportBundle

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM