简体   繁体   中英

Trouble reading huge CSV file with php fgetcsv - understanding memory consumption

Good morning, I´m actually going through some hard lessons while trying to handle huge csv files up to 4GB.

Goal is to search some items in a csv file (Amazon datafeed) by a given browsenode and also by some given item id´s (ASIN). To get a mix of existing items (in my database) plus some additional new itmes since from time to time items disapear on the marketplace. I also filter the title of the items because there are many items using the same.

I have been reading here lots af tips and finally decided to use php´s fgetcsv() and thought this function will not exhaust memory, since it reads the file line by line. But no matter what I try I´m always running out of memory. I can not understand why my code uses so much memory.

I set the memory limit to 4096MB, time limit is 0. Server has 64 GB Ram and two SSD hardisks.

May someone please check out my piece of code and explain how it is possible that im running out of memory and more important how memory is used?

private function performSearchByASINs()
{
    $found = 0;
    $needed = 0;
    $minimum = 84;
    if(is_array($this->searchASINs) && !empty($this->searchASINs))
    {
        $needed = count($this->searchASINs);
    }
    if($this->searchFeed == NULL || $this->searchFeed == '')
    {
        return false;
    }
    $csv = fopen($this->searchFeed, 'r');
    if($csv)
    {
        $l = 0;
        $title_array = array();
        while(($line = fgetcsv($csv, 0, ',', '"')) !== false)
        {
            $header = array();
            if(trim($line[6]) != '')
            {
                if($l == 0)
                {
                    $header = $line;
                }
                else
                {
                    $asin = $line[0];
                    $title = $this->prepTitleDesc($line[6]);
                    if(is_array($this->searchASINs) 
                    && !empty($this->searchASINs) 
                    && in_array($asin, $this->searchASINs)) //search for existing items to get them updated
                    {
                        $add = true;
                        if(in_array($title, $title_array))
                        {
                            $add = false; 
                        }
                        if($add === true)
                        {
                            $this->itemsByASIN[$asin] = new stdClass();
                            foreach($header as $k => $key)
                            {
                                if(isset($line[$k]))
                                {
                                    $this->itemsByASIN[$asin]->$key = trim(strip_tags($line[$k], '<br><br/><ul><li>'));
                                }
                            }
                            $title_array[] = $title;
                            $found++;
                        }
                    }
                    if(($line[20] == $this->bnid || $line[21] == $this->bnid) 
                    && count($this->itemsByKey) < $minimum 
                    && !isset($this->itemsByASIN[$asin])) // searching for new items
                    {
                        $add = true;
                        if(in_array($title, $title_array))
                        {
                           $add = false;
                        }
                        if($add === true)
                        {
                            $this->itemsByKey[$asin] = new stdClass();
                            foreach($header as $k => $key)
                            {
                                if(isset($line[$k]))
                                {
                                    $this->itemsByKey[$asin]->$key = trim(strip_tags($line[$k], '<br><br/><ul><li>'));                                
                                }
                            }
                            $title_array[] = $title;
                            $found++;
                        }
                    }
                }
                $l++;
                if($l > 200000 || $found == $minimum)
                {
                    break;
                }
            }
        }
        fclose($csv);
    }
}

I know my answer is a bit late but I had a similar problem with fgets() and things based on fgets() like SplFileObject->current() function. In my case it was on a windows system when trying to read a +800MB file. I think fgets() doesn't free the memory of the previous line in a loop . So every line that was read stayed in memory and let to a fatal out of memory error . I fixed it using fread($lineLength) instead but it is a bit trickier since you must supply the length.

It is very hard to manage large data using array without encountering timeout issue. Instead why not parse this datafeed to a database table and do the heavy lifting from there.

Have you tried this? SplFileObject::fgetcsv

<?php
$file = new SplFileObject("data.csv");
while (!$file->eof()) {
    //your code here
}
?>

You are running out of memory because you use variables, and you are never doing an unset(); and use too many nested foreach . You could shrink that code in more functions A solution should be, use a real Database instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM