将自定义列添加到 ML.NET 中的 IDataView

Question

I'd like to add a custom column after loading my IDataView from file.从文件加载IDataView后，我想添加一个自定义列。 In each row, the column value should be the sum of previous 2 values.在每一行中，列值应该是前 2 个值的总和。 A sort of Fibonacci series.一种斐波那契数列。

I was wondering to create a custom transformer but I wasn't able to find something that could help me to understand how to proceed.我想创建一个自定义转换器，但我找不到可以帮助我理解如何进行的东西。 I also tried to clone ML.Net Git repository in order to see how other transformers were implemented but I saw many classes are marked as internal so I cannot re-use them in my project.我还尝试克隆 ML.Net Git 存储库以查看其他转换器是如何实现的，但我看到许多类被标记为内部类，因此我无法在我的项目中重新使用它们。

Answer 1

There is a way to create a custom transform with CustomMapping有一种方法可以使用CustomMapping创建自定义转换

Here's an example I used for this answer .这是我用于此答案的示例。

The input and output classes:输入和输出类：

class InputData
{
    public int Age { get; set; }
}

class CustomMappingOutput
{
    public string AgeName { get; set; }
}

class TransformedData
{
    public int Age { get; set; }

    public string AgeName { get; set; }
}

Then, in the ML.NET program:然后，在 ML.NET 程序中：

MLContext mlContext = new MLContext();

var samples = new List<InputData>
{
    new InputData { Age = 16 },
    new InputData { Age = 35 },
    new InputData { Age = 60 },
    new InputData { Age = 28 },
};

var data = mlContext.Data.LoadFromEnumerable(samples);

Action<InputData, CustomMappingOutput> mapping =
    (input, output) =>
    {
        if (input.Age < 18)
        {
            output.AgeName = "Child";
        }
        else if (input.Age < 55)
        {
            output.AgeName = "Man";
        }
        else
        {
            output.AgeName = "Grandpa";
        }
    };

var pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null);

var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);

var dataEnumerable = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: true);

foreach (var row in dataEnumerable)
{
    Console.WriteLine($"{row.Age}\t {row.AgeName}");
}

Answer 2

Easy thing.容易的事。 I am assuming, you know how to use pipelines.我假设您知道如何使用管道。

This is a part of my project, where I merge two columns together:这是我项目的一部分，我将两列合并在一起：

IEstimator<ITransformer> pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null)
                            .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question1", outputColumnName: "question1Featurized"))
                            .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question2", outputColumnName: "question2Featurized"))
                            .Append(mlContext.Transforms.Concatenate("Features", "question1Featurized", "question2Featurized"))
                            //.Append(mlContext.Transforms.NormalizeMinMax("Features"))
                            //.AppendCacheCheckpoint(mlContext)
                            .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: nameof(customTransform.Label), featureColumnName: "Features"));

As you can see the two columns question1Featurized and question2Featurized are combined into Features which will be created and can be used as any other column of IDataView .正如您所看到的，两列question1Featurized和question2Featurized被组合到Features ，这些Features将被创建并可用作IDataView任何其他列。 The Features column does not need to be declared in a separate class. Features列不需要在单独的类中声明。

So in your case you should transform the columns firs in their data type, if strings you can do what I did and in case of numeric values use a custom Transformer/customMapping .因此，在您的情况下，您应该先将列转换为它们的数据类型，如果字符串可以执行我所做的操作，并且在数字值的情况下使用自定义 Transformer/customMapping 。

The documentation of the Concatenate function might help as well! Concatenate函数的文档也可能有所帮助！

将自定义列添加到 ML.NET 中的 IDataView

问题描述

2 个解决方案

解决方案1
1 2019-06-25 21:30:38

解决方案2
0 2020-05-28 15:32:00

将自定义列添加到 ML.NET 中的 IDataView

问题描述

2 个解决方案

解决方案1 1 2019-06-25 21:30:38

解决方案2 0 2020-05-28 15:32:00

解决方案1
1 2019-06-25 21:30:38

解决方案2
0 2020-05-28 15:32:00