简体   繁体   中英

Handling large amounts of data in php without using ini_set('memory_limit', '-1');

I am trying to make a Sentiment classifier using PHP, i have a file of bigrams containing 1020386 record, when i load the file it's ok, when i perform an operation on it i get Allowed memory size of 134217728 bytes exhausted , i tried to perform the operation on 1000 records at a time but still same issue, i am using codeigniter and the file helper class.

 $reader = new FilePrep();
 $content = $reader->read(base_url().'Assets/files/w2_.txt');
 $delimited = explode(PHP_EOL, $content);
 $ngrams = array();
 for($from = 0; $to = sizeof($delimited) ; $from+=1000){
        $new = array_slice($ngrams, $from, 1000);
        foreach($new as $ngram){
            $del = explode(' ', $ngram);
            array_push($ngrams, array($del[0],$del[1].' '.$del[2]));
        }
 }
 print_r($ngrams);

FilePrep.php

public function read($path){
    $handle = fopen($path,'r');
    $string = stream_get_contents($handle);
    return $string;
}

Thanks in advance

UPDATE:

I didn't realise some issues in the code. You have something wrong with the for. There is no exit so PHP can't stop working and you get memory error. Second thing is you are trying load whole file into variable and then divide it int array. Better way would be to read file line by line and then assign data. Last thing is you can't assign 1M lines into two dimension array variable. Every time you do array_push memory for PHP is shrinked by ~650 bites (basing on your w2_.txt file example.

Please take a look on given code. You can see how PHP is using memory when you are adding data to array:

Example:

$handle = fopen('w2_.txt', 'r');

$ngrams = array();
$i=0;
echo $i . ': ' . memory_get_usage() . "\t";

$current_memory_usage = memory_get_usage();
while (($line = fgets($handle, 8192)) !== false) {
    $i++;
    echo "File line # $i \t";
    $del = explode("\t", $line);
    array_push($ngrams, array($del[0],$del[1].' '.$del[2]));

    echo "[ rised: " . (memory_get_usage() - $current_memory_usage) . "\t total: " . memory_get_usage() . "]\n";
    $current_memory_usage = memory_get_usage();
}
fclose($handle); 

gives output:

FILE              MEMORY RISED   TOTAL MEMORY USED
File line # 1   [ rised: 9984    total: 254248]
File line # 2   [ rised: 616     total: 254864]
File line # 3   [ rised: 648     total: 255512]
File line # 4   [ rised: 632     total: 256144]
File line # 5   [ rised: 640     total: 256784]
File line # 6   [ rised: 640     total: 257424]
File line # 7   [ rised: 640     total: 258064]
File line # 8   [ rised: 640     total: 258704]
File line # 9   [ rised: 704     total: 259408]
File line # 10  [ rised: 656     total: 260064]
File line # 11  [ rised: 624     total: 260688]
File line # 12  [ rised: 640     total: 261328]
File line # 13  [ rised: 640     total: 261968]
File line # 14  [ rised: 640     total: 262608]
File line # 15  [ rised: 640     total: 263248]
File line # 16  [ rised: 640     total: 263888]
File line # 17  [ rised: 768     total: 264656]
File line # 18  [ rised: 640     total: 265296]
File line # 19  [ rised: 640     total: 265936]
File line # 20  [ rised: 640     total: 266576]
File line # 21  [ rised: 640     total: 267216]
File line # 22  [ rised: 640     total: 267856]
...

OLD ANSWER:

Not sure how much will it help.

1) Try replace:

foreach($new as $ngram)

with

foreach($new as &$ngram)

Foreach iterates on a copy of variable. If you set it as reference '&ngram' you save memory by operating on the same variable.

2) If there is not used variable - clear it it.

$del = null

3) You can add to your source code:

echo memory_get_usage() . "\n";

so you be able to see where memory is consumed too much.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM