简体   繁体   English

在Symfony中,将大型Excel导入数据库的速度非常慢

[英]Large Excel import to DB gets very slow in Symfony

I have a script that imports a large Excel file with a lot of foreach es and after 50 something iterations it gets unbearably slow... Can I improve this somehow? 我有一个脚本,该脚本导入了带有大量foreach es的大型Excel文件,并且经过50次迭代后,它变得异常缓慢……我能以某种方式改进它吗?

I try to make it as readable as possible with this: 我尝试通过以下方法使其尽可能可读:

foreach worksheet (approx 20) {
    NEW DB ENTRY, PERSIST, FLUSH (account)
    foreach row (10-100){
        NEW DB ENTRY, PERSIST, FLUSH (object)
        foreach column (approx. 10){
            CREATE NEW DB ENTRY, FOREIGN KEY to 'object', PERSIST, FLUSH (weekdates)
        }
        foreach column (approx. 50){
            CREATE NEW DB ENTRY, FOREIGN KEY to 'object', PERSIST, FLUSH (scheduleEntry)

            CREATE NEW DB ENTRY, FOREIGN KEY to 'scheduleEntry', PERSIST, FLUSH (scheduleObject)

            CREATE NEW DB ENTRY, FOREIGN KEY to 'scheduleObject', PERSIST, FLUSH (scheduleModule)

           /* WORST CASE IS THAT HERE WE HAVE FLUSHED 100000 times */
        }
    }
}

Is there a way to fasten up especially the last foreach? 有没有办法加快最后的学习时间? I think I need to flush everytime as I have to FOREIGN KEY the previous entry to the new one, am I right? 我想我每次都需要刷新,因为我必须将FOREIGN KEY上一个新条目录入,对吗? By slow I mean that the excel file takes 24+ hours to import. 慢速是指excel文件需要24多个小时才能导入。 It had about the numbers in the example. 它包含示例中的数字。

The actual (still simplyfied) code looks sth like this 实际的(仍然简单)的代码看起来像这样

/* Create Excel */
$excel = $this->getContainer()->get('phpexcel')->createPHPExcelObject(Constants::FULL_PATH . 'excel/touren_' . $filename . '.xls');
$sheets = $excel->getAllSheets();
foreach ($sheets as $id => $sheet) {
    $ws = $sheet->toArray();

    /* Read sth from first line and create an 'account' from this */
    $n = new Network();
    ....
    $em->persist($n);

    try {
        $em->flush();
        $output->writeln('----><info>Inserted in DB</info>');
    } catch (Exception $e) {
        $output->writeln('----><error>DB ERROR</error>');
    }

    /* Go through all rows of current WorkSheet */
    foreach ($ws as $row) {
        /* Create new Object */
        $object = new Object();
        ...
        $em->persist($object);

        try {
            $em->flush();
            $output->writeln("------->Save Object to DB: <info>OK</info>");
        } catch (\Exception $e) {
            $output->writeln("------->Save Object to DB: <error>Failed: " . $e->getMessage() . "</error>");
        }

       /* Create new Tour for weekday/client */
       $tour = new Tour();
       $tour->setNetwork($n);

      /* More foreach */
      foreach ($clientKey as $filialNo => $filialKey) {
          $tourObject = new TourObject();
          $tourObject->setTour($tour);
          $tourObject->setObject($o);
          $em->persist($tourObject);


         /* Count Intervals */
        foreach ($filialKey as $tasks) {
            if (!$tourObject->getModule()->contains($module)) {
                $tourObject->addModule($module);
                $em->persist($tourObject);

                /* More foreach */
                foreach ($period as $date) {
                    $schedule = new Schedule();
                    $schedule->setTour($tour);
                    ....
                    $em->persist($schedule);
                    try {
                        $em->flush();
                        $output->writeln("------->Save Schedule to DB: <info>OK</info>");
                    } catch (\Exception $e) {
                        $output->writeln("------->Save Schedule to DB: <error>Failed: " . $e->getMessage() . "</error>");
                    }


                    $scheduleObject = new ScheduleObject();
                    $scheduleObject->setSchedule($schedule);
                    ....
                    $em->persist($scheduleObject);
                    try {
                        $em->flush();
                        $output->writeln("------->Save ScheduleObject to DB: <info>OK</info>");
                    } catch (\Exception $e) {
                        $output->writeln("------->Save ScheduleObject to DB: <error>Failed: " . $e->getMessage() . "</error>");
                    }

                    $scheduleObjectModule = new ScheduleObjectModule();
                    $scheduleObjectModule->setScheduleObject($scheduleObject);
                    $em->persist($scheduleObjectModule);
                    try {
                        $em->flush();                                                               
                        $output->writeln("------->Save ScheduleObjectModule to DB: <info>OK</info>");
                    } catch (\Exception $e) {
                        $output->writeln("------->Save ScheduleObjectModule to DB: <error>Failed: " . $e->getMessage() . "</error>");
                    }
                }
            }
        }
      }

      /* Flush all?!? */
      try {
            $em->flush();
            $output->writeln("------->Save Task to DB: <info>OK</info>");
      } catch (\Exception $e) {
            $output->writeln("------->Save Task to DB: <error>Failed: " . $e->getMessage() . "</error>");
      }
    }

Every entity you create/persist through the EntityManager is stored in the UnitOfWork and now became a "managed" entity. 您通过EntityManager创建/持久化的每个实体都存储在UnitOfWork中,现在成为“托管”实体。 If this UnitOfWork fills up, its fairly heavy on the system. 如果这个UnitOfWork装满了,那么它将对系统造成很大的负担。 You could call $entityManager->clear() after each "sheet" so that the UoW gets cleared after each iteration. 您可以在每个“工作表”之后调用$ entityManager-> clear(),以便在每次迭代后清除UoW。

Each entity has its own UnitOfWork, and you can clear the UoW for each entity separately, but since you create lots of entities, i would suggest not specifying an entity class and just clearing all of them. 每个实体都有其自己的UnitOfWork,您可以分别清除每个实体的UoW,但是由于您创建了大量实体,因此我建议您不要指定实体类,而只需清除所有实体。

  ...
  /* Flush all?!? */
  try {
        $em->flush();
        $em->clear();
        $output->writeln("------->Save Task to DB: <info>OK</info>");
  } catch (\Exception $e) {
        $output->writeln("------->Save Task to DB: <error>Failed: " . $e->getMessage() . "</error>");
  }

Or you could use native queries to insert in your DB, but that might not always be what you want in terms of data consistency etc.etc. 或者您可以使用本机查询插入数据库中,但是就数据一致性等而言,这可能并不总是您想要的。

Also as pointed out above, you don't need to flush after each entity. 同样如上所述,您不需要在每个实体之后刷新。 If you call flush only once, after each 'sheet', Doctrine will do all insert statements at once. 如果您只调用一次flush,则在每个“工作表”之后,Doctrine将立即执行所有插入语句。

I think a good solution is to use a native DB utility for this (like Mysql Load data infile ) 我认为一个好的解决方案是为此使用本机数据库实用程序(例如Mysql Load data infile

This is going to be a lot faster than anything you can write in PHP. 这将比您可以用PHP编写的任何东西都要快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM