[英]How to increase the amount of features in PHP-AI in PHP?
I am building a Logistic Regression platform in PHP.我正在用 PHP 构建一个逻辑回归平台。 The following code snippet works fine when there is only one feature inside the data frame.当数据框内只有一个特征时,以下代码片段工作正常。 For example a CSV file like this:例如像这样的 CSV 文件:
"sample","language"
"Hello, how are you?","english",
"Je voudrais une boîte de chocolats.","french"
...
However, when I try to train the AI with 2 features based on the titanic survival rate (hypothesis: Does the amount of siblings and spouses effect the survival rate) with a data frame like this:但是,当我尝试根据泰坦尼克号的存活率(假设:兄弟姐妹和配偶的数量是否会影响存活率)使用如下数据框训练具有 2 个特征的 AI 时:
"SibSp","Parch","Survived",
"1", "1", "1",
"3", "3", "1",
"4", "1", "0"
...
I am getting this error:我收到此错误:
Phpml\\Exception\\InvalidArgumentException Size of given arrays does not match Phpml\\Exception\\InvalidArgumentException 给定数组的大小不匹配
My code snippet looks like this, $request->features
holds the amount of features this data frame has since features +1
will hold the actual outcome (1 = survived, 0 = died):我的代码片段看起来像这样, $request->features
保存了这个数据框的特征数量,因为features +1
将保存实际结果(1 = 幸存,0 = 死亡):
$dataset = new CsvDataset($file, (int) $request->features);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
for($i = 0; $i <= $request->features -1; $i++):
foreach ($dataset->getSamples() as $sample):
$samples[$i][] = $sample[$i];
endforeach;
endfor;
for($i = 0; $i <= count($samples) -1; $i++):
$vectorizer->fit($samples[$i]);
$vectorizer->transform($samples[$i]);
$tfIdfTransformer->fit($samples[$i]);
$tfIdfTransformer->transform($samples[$i]);
endfor;
$dataset = new ArrayDataset($samples, $dataset->getTargets()); # This throws the error
I am using PHP-AI/PHP-ML and here is an example of how the AI works with the data frame with only 1 feature provided by the framework.我正在使用PHP-AI/PHP-ML ,这里是一个示例,说明 AI 如何与数据框一起工作,该框架仅提供 1 个功能。
I understand the error, $dataset->getTargets()
only holds 1 array, where as $samples
holds 2 arrays.我理解错误, $dataset->getTargets()
只包含 1 个数组,而$samples
包含 2 个数组。 However, this has got me stumped since that is how it should be (in theory).然而,这让我很难过,因为它应该是这样(理论上)。
I am storing the classifier (or trained AI) as a serialised object inside my database once it has been trained to remember its trained state.我将分类器(或经过训练的 AI)作为序列化对象存储在我的数据库中,一旦它被训练记住其训练状态。 Everything works fine when I only use a data frame with one feature.当我只使用具有一项功能的数据框时,一切正常。 Does anyone have experience using PHP-AI within the PHP-ML library that can help?有没有人有在 PHP-ML 库中使用 PHP-AI 的经验可以提供帮助?
How can I increase the amount of features inside PHP-AI?如何增加 PHP-AI 中的功能数量?
Update to show what values my arrays hold:更新以显示我的数组保存的值:
$samples
looks like this (array of siblings, array of spouses): $samples
看起来像这样(兄弟姐妹数组,配偶数组):
array ( 0 => array ( 0 => array ( ), 1 => array ( ), 2 => array ( ), 3 => array ( ), 4 => array ( ), 5 => array ( ), 6 => array ( ), 7 => array ( ), ), 1 => array ( 0 => array ( ), 1 => array ( ), 2 => array ( ), 3 => array ( ), 4 => array ( ), 5 => array ( ), 6 => array ( ), 7 => array ( ), ), )
$dataset->getTargets()
looks like this (survived or died): $dataset->getTargets()
看起来像这样(存活或死亡):
array ( 0 => '1', 1 => '1', 2 => '0', 3 => '1', 4 => '0', 5 => '0', 6 => '1', 7 => '1', )
I believe that the $samples
array should be 1 array holding child arrays of [SibSp, Spous].我相信$samples
数组应该是 1 个包含 [SibSp, Spous] 子数组的数组。 I cannot think how to re-organise the array to be like this.我想不出如何将数组重新组织成这样。
After fiddling around with the code and researching the error and how to get around it - I realised that the $samples
data should be expressed as在摆弄代码并研究错误以及如何解决它之后 - 我意识到$samples
数据应该表示为
Array [ 0 => [SibSp, Spous], 1 => [SibSp, Spous], ... ]
So by re-fiddling the data like so:因此,通过像这样重新摆弄数据:
$result = [];
foreach($samples as $arr) {
foreach($arr as $k => $v) {
$result[$k][] = $v;
}
}
I can achieve this desired outcome.我可以达到这个理想的结果。 I still had to push the samples into the vectorizer as $sample
but the final Dataset had to be re-fiddled:我仍然必须将样本作为$sample
推入矢量化器,但必须重新调整最终的数据集:
for($i = 0; $i <= count($samples) -1; $i++):
$vectorizer->fit($samples[$i]);
$vectorizer->transform($samples[$i]);
$tfIdfTransformer->fit($samples[$i]);
$tfIdfTransformer->transform($samples[$i]);
endfor;
$dataset = new ArrayDataset($result, $dataset->getTargets());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.