简体   繁体   English

在 ML.NET 中执行 ITransformer.Transform 后从 IDataView 中提取 MultiClass 结果

[英]Extract MultiClass results from IDataView after performing ITransformer.Transform in ML.NET

i am trying to use ML.NET generically without having to create a class as input and output of a model.我正在尝试一般使用 ML.NET,而不必创建 class 作为输入和 model 的 output。 To do that, after creating a model with this:为此,在使用以下命令创建 model 后:

        public static (ITransformer model, double accuracy) TrainMultiClassModel(MulticlassExperimentSettings experimentSettings, MLContext mlContext, IDataView myview, string LabelName)
    {
        ITransformer trainedModel;
        MulticlassClassificationExperiment experiment = mlContext.Auto().CreateMulticlassClassificationExperiment(experimentSettings);

        ExperimentResult<MulticlassClassificationMetrics> experimentResult = experiment.Execute(myview, LabelName);
        RunDetail<MulticlassClassificationMetrics> best = experimentResult.BestRun;

        trainedModel = best.Model;

        return (trainedModel, best.ValidationMetrics.MacroAccuracy);
    }

Where the myView contains a CSV file with correctly set DataKinds.其中 myView 包含正确设置 DataKinds 的 CSV 文件。

Example of the Data:数据示例: 在此处输入图像描述

Then i execute that model by running something like this:然后我通过运行如下代码执行 model:

            MemoryStream modelStream = new MemoryStream(ModelData);
            ITransformer trainedModel = mlContext.Model.Load(modelStream, out var modelInputSchema);
            var predictions = trainedModel.Transform(myview);

Again, the myView contains Data from a CSV file, just with the predicted column empty.同样,myView 包含来自 CSV 文件的数据,只是预测列为空。

Now we have the "predictions", which is of type IDataView.现在我们有了 IDataView 类型的“预测”。

For regression results, thats easy.对于回归结果,这很容易。 Look for the Schema named "Score" and load it as float:查找名为“Score”的模式并将其加载为浮点数:

float[] scoreColumn = predictions.GetColumn<float>("Score").ToArray();

But how does it work for MultiClass experiments?但它如何用于 MultiClass 实验? There is a Schema called "PredictedLabel" of type "String", but it contains numbers between 0 and 1 when read like this:有一个名为“PredictedLabel”的“String”类型的模式,但它包含 0 到 1 之间的数字,如下所示:

var labelColumn = predictions.Schema.FirstOrDefault(s => s.Name == "PredictedLabel" && s.IsHidden == false);
string[] scoreColumn = predictions.GetColumn<string>(labelColumn).ToArray();

How do i get the actual names of the (in this case) Species?我如何获得(在这种情况下)物种的实际名称? Or do i have to map the numbers to the name somehow?还是我必须以某种方式将名称中的数字 map ? Which mapping table do i use for that?我为此使用哪个映射表?

Thank you in advance.先感谢您。

edit: the code by Eric gave this list:编辑:埃里克的代码给出了这个列表:

1.4
1.9
0.2
0.4
 0.3
0.1
0.5
0.6
1.5
1.3
1.6
1.0
1.1
1.8
1.2
1.7
2.5
2.1
2.2
2.0
2.4
2.3

Those are 22, which is weired: none of the correct species does have 22 characters (in case thats the chars from a name), and i did input just 4 rows of data to solve.那些是 22,这很奇怪:没有一个正确的物种确实有 22 个字符(如果那是名字中的字符),我确实只输入了 4 行数据来解决。 The "PredictedLabel" does meanwhile output 4 values, but which are still numbers: “PredictedLabel”同时 output 4 个值,但仍然是数字: 在此处输入图像描述

But now i am wondering: how do i read this field?但现在我想知道:我如何阅读这个领域? Maybe it contains the answer:也许它包含答案: 在此处输入图像描述

What you want to use is a method called GetKeyValues .您要使用的是一种名为GetKeyValues的方法。 This will give you a VBuffer<ReadOnlyMemory<char>> , where each string in the buffer is the "value" for the corresponding index into the "keys" or "classes" in your multi-class classification model.这将为您提供VBuffer<ReadOnlyMemory<char>> ,其中缓冲区中的每个字符串都是多类分类 model 中“键”或“类”的相应索引的“值”。

var predictions = trainedModel.Transform(myview);

var labelColumn = predictions.Schema[labelName]; // this is "Species" in your example above

VBuffer<ReadOnlyMemory<char>> keys = default;
labelColumn.GetKeyValues(ref keys);

foreach (var key in keys.DenseValues())
{
    Console.WriteLine(key);
}

For the sake of completeness, and to resolve the confusion in the initial question, here is the answer.为了完整起见,并解决最初问题中的困惑,这里是答案。

First, why did "PredictedLabel" and "GetKeyValues" (from Erics answer) not provide usable results?首先,为什么“PredictedLabel”和“GetKeyValues”(来自 Erics 的回答)没有提供可用的结果? The problem here was the way i used the IDataView.这里的问题是我使用 IDataView 的方式。 When training the data i loaded the whole CSV, including the "ID" column, which i did not provide while executing the model (as that column is not valuable to use the model).在训练数据时,我加载了整个 CSV,包括“ID”列,在执行 model 时我没有提供该列(因为该列对使用模型没有价值)。 after switching to always omit the "ID" column and have the same CSV layout in training and execution, both the approach from Eric and mine started working.在切换到始终省略“ID”列并在训练和执行中具有相同的 CSV 布局后,Eric 和我的方法都开始工作了。

So when you want to interpret your results into the correct formats, first see if a "PredictedLabel" exists in the resulting DataView Schema.因此,当您想将结果解释为正确的格式时,首先查看生成的 DataView Schema 中是否存在“PredictedLabel”。

if (predictions.Schema.Any(s => s.Name == "PredictedLabel"))

If it does, check its DataType.如果是,请检查其 DataType。 That is how you can differentiate between MultiClass and Binary results:这就是您可以区分 MultiClass 和 Binary 结果的方式:

    var labelColumn = predictions.Schema.FirstOrDefault(s => s.Name == "PredictedLabel" && s.IsHidden == false);            
    if (labelColumn.Type.ToString() == "Boolean")
{
    bool[] binaryResults = predictions.GetColumn<bool>(labelColumn).ToArray();
}

(or) (或者)

    if (labelColumn.Type.ToString() == "String")
{
    string[] multiclassResults = predictions.GetColumn<string>(labelColumn).ToArray();
}

The multiclassResults will now contain your written down MultiClass results as string. multiclassResults 现在将包含您写下的 MultiClass 结果作为字符串。

In case there is no PredictedLabel, there should be a "Score" Schema, which contains your regression results:如果没有 PredictedLabel,则应该有一个“分数”模式,其中包含您的回归结果:

float[] regressionResults = predictions.GetColumn<float>("Score").ToArray();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM