简体   繁体   中英

How to increase the amount of features in PHP-AI in PHP?

I am building a Logistic Regression platform in PHP. The following code snippet works fine when there is only one feature inside the data frame. For example a CSV file like this:

"sample","language"
"Hello, how are you?","english",
"Je voudrais une boîte de chocolats.","french"
...

However, when I try to train the AI with 2 features based on the titanic survival rate (hypothesis: Does the amount of siblings and spouses effect the survival rate) with a data frame like this:

"SibSp","Parch","Survived",
"1", "1", "1",
"3", "3", "1",
"4", "1", "0"
...

I am getting this error:

Phpml\\Exception\\InvalidArgumentException Size of given arrays does not match

My code snippet looks like this, $request->features holds the amount of features this data frame has since features +1 will hold the actual outcome (1 = survived, 0 = died):

$dataset = new CsvDataset($file, (int) $request->features);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();

$samples = [];

for($i = 0; $i <= $request->features -1; $i++):
    foreach ($dataset->getSamples() as $sample):
        $samples[$i][] = $sample[$i];
    endforeach;
endfor;

for($i = 0; $i <= count($samples) -1; $i++):
    $vectorizer->fit($samples[$i]);
    $vectorizer->transform($samples[$i]);

    $tfIdfTransformer->fit($samples[$i]);
    $tfIdfTransformer->transform($samples[$i]);
endfor;

$dataset = new ArrayDataset($samples, $dataset->getTargets()); # This throws the error

I am using PHP-AI/PHP-ML and here is an example of how the AI works with the data frame with only 1 feature provided by the framework.

I understand the error, $dataset->getTargets() only holds 1 array, where as $samples holds 2 arrays. However, this has got me stumped since that is how it should be (in theory).

I am storing the classifier (or trained AI) as a serialised object inside my database once it has been trained to remember its trained state. Everything works fine when I only use a data frame with one feature. Does anyone have experience using PHP-AI within the PHP-ML library that can help?

How can I increase the amount of features inside PHP-AI?

Update to show what values my arrays hold:

$samples looks like this (array of siblings, array of spouses):

array ( 0 => array ( 0 => array ( ), 1 => array ( ), 2 => array ( ), 3 => array ( ), 4 => array ( ), 5 => array ( ), 6 => array ( ), 7 => array ( ), ), 1 => array ( 0 => array ( ), 1 => array ( ), 2 => array ( ), 3 => array ( ), 4 => array ( ), 5 => array ( ), 6 => array ( ), 7 => array ( ), ), )

$dataset->getTargets() looks like this (survived or died):

array ( 0 => '1', 1 => '1', 2 => '0', 3 => '1', 4 => '0', 5 => '0', 6 => '1', 7 => '1', )

I believe that the $samples array should be 1 array holding child arrays of [SibSp, Spous]. I cannot think how to re-organise the array to be like this.

After fiddling around with the code and researching the error and how to get around it - I realised that the $samples data should be expressed as

Array [ 0 => [SibSp, Spous], 1 => [SibSp, Spous], ... ]

So by re-fiddling the data like so:

$result = [];
foreach($samples as $arr) {
    foreach($arr as $k => $v) {
    $result[$k][] = $v;
    }
}

I can achieve this desired outcome. I still had to push the samples into the vectorizer as $sample but the final Dataset had to be re-fiddled:

for($i = 0; $i <= count($samples) -1; $i++):    
    $vectorizer->fit($samples[$i]);
    $vectorizer->transform($samples[$i]);

    $tfIdfTransformer->fit($samples[$i]);
    $tfIdfTransformer->transform($samples[$i]);

endfor;

$dataset = new ArrayDataset($result, $dataset->getTargets());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM