简体   繁体   中英

Running out of memory on PHP-ML

I am trying to implement a sentiment analysis with PHP-ML. I have a training data set of roughly 15000 entries. I have the code working, however, I have to reduce the data set down to 100 entries for it to work. When I try to run the full data set I get this error:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 917504 bytes) in C:\Users\<username>\Documents\Github\phpml\vendor\php-ai\php-ml\src\Phpml\FeatureExtraction\TokenCountVectorizer.php on line 95

The two files I have are index.php:

<?php 

declare(strict_types=1);
namespace PhpmlExercise;

include 'vendor/autoload.php';

include 'SentimentAnalysis.php';

use PhpmlExercise\Classification\SentimentAnalysis;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;

$dataset = new CsvDataset('clean_tweets2.csv', 1, true);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
    $samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);

$trainingSamples = $randomSplit->getTrainSamples();
$trainingLabels     = $randomSplit->getTrainLabels();

$testSamples = $randomSplit->getTestSamples();
$testLabels      = $randomSplit->getTestLabels();

$classifier = new SentimentAnalysis();
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());

echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);

And SentimentAnalysis.php:

<?php

namespace PhpmlExercise\Classification;
use Phpml\Classification\NaiveBayes;

class SentimentAnalysis
{
    protected $classifier;

    public function __construct()
    {
        $this->classifier = new NaiveBayes();
    }
    public function train($samples, $labels)
    {
        $this->classifier->train($samples, $labels);
    }

    public function predict($samples)
    {
        return $this->classifier->predict($samples);
    }
}

I am pretty new to Machine Learning and php-ml so I am not really sure how to deduce where the issue is or if there is even a way to fix this without having a ton of memory. The most I can tell is that the error is happening in TokenCountVectorizer on line 22 of the index file. Does anyone have any idea what may be causing this issue o have run into this before?

The link to PHP-ML is here: http://php-ml.readthedocs.io/en/latest/

Thank you

This error comes from loading more into memory than what PHP is set up to handle in one process. There are other causes, but these are much less common.

In your case, your PHP instance seems configured to allow a maximum of 128MB of memory to be used. In machine learning, that is not very much and if you use large datasets you will most definitely hit that limit.

To alter the amount of memory you allow PHP to use to 1GB you can edit your php.ini file and set

memory_limit = 1024M

If you don't have access to your php.ini file but still have the permissions to change the setting you can do it at runtime using

<?php
    ini_set('memory_limit', '1024M');

Alternatively, if you run Apache you can try to set the memory limit using a .htaccess file directive

php_value memory_limit 1024M

Do note that most shared hosting solutions etc have a hard, and often low, limit on the amount of memory you are allowed to use.

Other things you can do to help are

  • If you load data from files look at fgets and SplFileObject::fgets to load read files line-by-line instead of reading the complete file into memory at once.
  • Make sure you are running an as up to date version as possible of PHP
  • Make sure PHP extensions are up to date
  • Disable PHP extensions you don't use
  • unset data or large objects that you are done with and don't need in memory anymore. Note that PHP's garbage collector will not necessarily free the memory right away. Instead, by design, it will do that when it feels the CPU cycles required exists or before the script is about to run out of memory, whatever occurs first.
  • You can use something like echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; to print memory usage at a given place in your program to try and profile how much memory different parts use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM