简体   繁体   English

PHP-ML 内存不足

[英]Running out of memory on PHP-ML

I am trying to implement a sentiment analysis with PHP-ML.我正在尝试使用 PHP-ML 实现情感分析。 I have a training data set of roughly 15000 entries.我有一个大约 15000 个条目的训练数据集。 I have the code working, however, I have to reduce the data set down to 100 entries for it to work.我有代码工作,但是,我必须将数据集减少到 100 个条目才能工作。 When I try to run the full data set I get this error:当我尝试运行完整数据集时,出现此错误:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 917504 bytes) in C:\Users\<username>\Documents\Github\phpml\vendor\php-ai\php-ml\src\Phpml\FeatureExtraction\TokenCountVectorizer.php on line 95

The two files I have are index.php:我拥有的两个文件是 index.php:

<?php 

declare(strict_types=1);
namespace PhpmlExercise;

include 'vendor/autoload.php';

include 'SentimentAnalysis.php';

use PhpmlExercise\Classification\SentimentAnalysis;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;

$dataset = new CsvDataset('clean_tweets2.csv', 1, true);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
    $samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);

$trainingSamples = $randomSplit->getTrainSamples();
$trainingLabels     = $randomSplit->getTrainLabels();

$testSamples = $randomSplit->getTestSamples();
$testLabels      = $randomSplit->getTestLabels();

$classifier = new SentimentAnalysis();
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());

echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);

And SentimentAnalysis.php:和 SentimentAnalysis.php:

<?php

namespace PhpmlExercise\Classification;
use Phpml\Classification\NaiveBayes;

class SentimentAnalysis
{
    protected $classifier;

    public function __construct()
    {
        $this->classifier = new NaiveBayes();
    }
    public function train($samples, $labels)
    {
        $this->classifier->train($samples, $labels);
    }

    public function predict($samples)
    {
        return $this->classifier->predict($samples);
    }
}

I am pretty new to Machine Learning and php-ml so I am not really sure how to deduce where the issue is or if there is even a way to fix this without having a ton of memory.我对机器学习和 php-ml 很陌生,所以我不确定如何推断问题所在,或者是否有办法在没有大量内存的情况下解决这个问题。 The most I can tell is that the error is happening in TokenCountVectorizer on line 22 of the index file.我能说的最多的是错误发生在索引文件第 22 行的 TokenCountVectorizer 中。 Does anyone have any idea what may be causing this issue o have run into this before?有没有人知道可能导致此问题的原因或以前遇到过这个问题?

The link to PHP-ML is here: http://php-ml.readthedocs.io/en/latest/ PHP-ML 的链接在这里: http : //php-ml.readthedocs.io/en/latest/

Thank you谢谢

This error comes from loading more into memory than what PHP is set up to handle in one process.此错误来自加载到内存中的内容,而不是 PHP 设置为在一个进程中处理的内容。 There are other causes, but these are much less common.还有其他原因,但这些不太常见。

In your case, your PHP instance seems configured to allow a maximum of 128MB of memory to be used.在您的情况下,您的 PHP 实例似乎配置为允许最多使用128MB的内存。 In machine learning, that is not very much and if you use large datasets you will most definitely hit that limit.在机器学习中,这不是很多,如果您使用大型数据集,您肯定会达到该限制。

To alter the amount of memory you allow PHP to use to 1GB you can edit your php.ini file and set要将允许 PHP 使用的内存量更改为1GB,您可以编辑php.ini文件并设置

memory_limit = 1024M

If you don't have access to your php.ini file but still have the permissions to change the setting you can do it at runtime using如果您无权访问 php.ini 文件但仍有权更改设置,则可以在运行时使用

<?php
    ini_set('memory_limit', '1024M');

Alternatively, if you run Apache you can try to set the memory limit using a .htaccess file directive或者,如果您运行Apache,您可以尝试使用.htaccess文件指令设置内存限制

php_value memory_limit 1024M

Do note that most shared hosting solutions etc have a hard, and often low, limit on the amount of memory you are allowed to use.请注意,大多数共享托管解决方案等对您可以使用的内存量都有严格的限制,而且通常很低。

Other things you can do to help are您可以做的其他事情是

  • If you load data from files look at fgets and SplFileObject::fgets to load read files line-by-line instead of reading the complete file into memory at once.如果从文件中加载数据,请查看fgetsSplFileObject::fgetsSplFileObject::fgets加载读取文件,而不是一次将完整文件读入内存。
  • Make sure you are running an as up to date version as possible of PHP确保您运行的是尽可能最新的 PHP 版本
  • Make sure PHP extensions are up to date确保 PHP 扩展是最新的
  • Disable PHP extensions you don't use禁用您不使用的 PHP 扩展
  • unset data or large objects that you are done with and don't need in memory anymore. unset数据或您已完成且不再需要在内存中的大对象。 Note that PHP's garbage collector will not necessarily free the memory right away.请注意,PHP 的垃圾收集器不一定会立即释放内存。 Instead, by design, it will do that when it feels the CPU cycles required exists or before the script is about to run out of memory, whatever occurs first.相反,根据设计,它会在感觉所需的 CPU 周期存在或脚本即将耗尽内存之前执行此操作,无论先发生什么。
  • You can use something like echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL;您可以使用类似echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; to print memory usage at a given place in your program to try and profile how much memory different parts use.在程序中的给定位置打印内存使用情况,以尝试分析不同部分使用的内存量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM