简体   繁体   English

批量导入文本文件导致Doctrine和Symfony(PHP)出现内存问题

[英]Batch import of textfile causes memory issue with Doctrine and Symfony (PHP)

I'm importing a textfile of 15MB into a mysql-database. 我正在将15MB的文本文件导入mysql数据库。 When I'm doing the following, the data are not imported but the memory usage is still constant about 28MB. 当我执行以下操作时,不会导入数据,但内存使用率仍保持恒定,约为28MB。

$handle = fopen("textfile.txt","r");
while (($data = fgetscsv($handle, 1024, "|")) !== false) {
    // processing data
}
fclose($handle);

When I bring now Doctrine into play, the memory usage still grows up and up until memory is overflow and the script crashes. 当我现在使用Doctrine时,内存使用量仍会不断增长,直到内存溢出并且脚本崩溃为止。

gc_enable(); // Enable Garbage Collector
$handle = fopen("textfile.txt","r");
$i=0;
while (($data = fgetscsv($handle, 1024, "|")) !== false) {
    $myEntity = $this->doctrine->getRepository('MyBundle:MyEntity')->find($data[0]);
    if (!$myEntity) {
        $myEntity = new MyEntity();
        $myEntity->setId($data[0]);
        $myEntity->setName($data[1]);
    } else {
        $myEntity->setName($data[1]);
    }
    $this->em->persist($myEntity);
    if ($i%100==0) {
       $this->em->flush();
       $this->em->clear();
       gc_collect_cycles();
    }
    $i++;
}
fclose($handle);
$this->em->flush();
gc_disable(); // Disable Garbage Collector

The memory usage grows up to 256 MB and then the script causes memory issue because of the limit which is set to 256 MB. 内存使用量增长到256 MB,然后脚本由于设置为256 MB的限制而导致内存问题。 So what (else) can I do to keep the memory usage low? 那么,我可以做什么(其他)来保持较低的内存使用率?

Try delaying every Doctrine's call as much as you can. 尝试尽可能延迟每个教义的通话。 In this particular code, Doctrine 's find() gets called for each file row. 在此特定代码中,将为每个文件行调用Doctrinefind() You could optimize this. 您可以对此进行优化。

$dataChunk = array();
while (($data = fgetscsv($handle, 1024, "|")) !== false) {

    # collect the data into array
    $dataChunk[$data[0]] = array( $data[0], $data[1] );

    if ($i > 0 && $i%100==0) {
        # lets prepare and persist the data
        $idsToRead = array_keys($dataChunk);

        # read all entitties with given IDs
        $entities = $this->doctrine->getRepository('MyBundle:MyEntity')->findEntitiesById($idsToRead);
        $existentIds = array_map($entities, function($e){ return $e->getId(); });

        # find out which Ids are non-existent
        $newIds = array_diff($idsToRead, $existentIds);

        # Overwrite the name for existent entity
        foreach ($entities as $e){
            $e->setName($dataChunk[$e->getId()][1]);
        }

        # Create new entities for non-existent IDs
        foreach ($newIds as $id){
            $e = new MyEntity();
            $e->setId($id);
            $e->setName($dataChunk[$id][1]);
            $this->em->persist($e);
        }

        # finally, flush the data
        $dataChunk = array();
        $this->em->flush();
        $this->em->clear();
        gc_collect_cycles();            
    }

    $i++;
}

Some things to pay attention to: 一些注意事项:

  • findEntitiesById would be custom repository method. findEntitiesById将是自定义存储库方法。 It does not exist out-of-box 它不存在
  • database communication get executed on every 100th record 每100条记录执行一次数据库通信
  • be sure to skip 0th row (i = 0, i % 100 == 0 => TRUE ). 确保跳过第0行(i = 0,i%100 == 0 => TRUE )。 I have added the condition 我添加了条件
  • be sure to apply the same logic if you do not reach modulus 100 at the end (number of rows is not dividable by 100) 如果最后没有达到模数100,请确保应用相同的逻辑(行数不能除以100)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM