[英]Add custom column to IDataView in ML.NET
I'd like to add a custom column after loading my IDataView
from file.从文件加载
IDataView
后,我想添加一个自定义列。 In each row, the column value should be the sum of previous 2 values.在每一行中,列值应该是前 2 个值的总和。 A sort of Fibonacci series.
一种斐波那契数列。
I was wondering to create a custom transformer but I wasn't able to find something that could help me to understand how to proceed.我想创建一个自定义转换器,但我找不到可以帮助我理解如何进行的东西。 I also tried to clone ML.Net Git repository in order to see how other transformers were implemented but I saw many classes are marked as internal so I cannot re-use them in my project.
我还尝试克隆 ML.Net Git 存储库以查看其他转换器是如何实现的,但我看到许多类被标记为内部类,因此我无法在我的项目中重新使用它们。
There is a way to create a custom transform with CustomMapping有一种方法可以使用CustomMapping创建自定义转换
Here's an example I used for this answer .这是我用于此答案的示例。
The input and output classes:输入和输出类:
class InputData
{
public int Age { get; set; }
}
class CustomMappingOutput
{
public string AgeName { get; set; }
}
class TransformedData
{
public int Age { get; set; }
public string AgeName { get; set; }
}
Then, in the ML.NET program:然后,在 ML.NET 程序中:
MLContext mlContext = new MLContext();
var samples = new List<InputData>
{
new InputData { Age = 16 },
new InputData { Age = 35 },
new InputData { Age = 60 },
new InputData { Age = 28 },
};
var data = mlContext.Data.LoadFromEnumerable(samples);
Action<InputData, CustomMappingOutput> mapping =
(input, output) =>
{
if (input.Age < 18)
{
output.AgeName = "Child";
}
else if (input.Age < 55)
{
output.AgeName = "Man";
}
else
{
output.AgeName = "Grandpa";
}
};
var pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null);
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
var dataEnumerable = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: true);
foreach (var row in dataEnumerable)
{
Console.WriteLine($"{row.Age}\t {row.AgeName}");
}
Easy thing.容易的事。 I am assuming, you know how to use pipelines.
我假设您知道如何使用管道。
This is a part of my project, where I merge two columns together:
这是我项目的一部分,我将两列合并在一起:
IEstimator<ITransformer> pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null)
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question1", outputColumnName: "question1Featurized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question2", outputColumnName: "question2Featurized"))
.Append(mlContext.Transforms.Concatenate("Features", "question1Featurized", "question2Featurized"))
//.Append(mlContext.Transforms.NormalizeMinMax("Features"))
//.AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: nameof(customTransform.Label), featureColumnName: "Features"));
As you can see the two columns question1Featurized
and question2Featurized
are combined into Features
which will be created and can be used as any other column of IDataView
.正如您所看到的,两列
question1Featurized
和question2Featurized
被组合到Features
,这些Features
将被创建并可用作IDataView
任何其他列。 The Features
column does not need to be declared in a separate class. Features
列不需要在单独的类中声明。
So in your case you should transform the columns firs in their data type, if strings you can do what I did and in case of numeric values use a custom Transformer/customMapping .因此,在您的情况下,您应该先将列转换为它们的数据类型,如果字符串可以执行我所做的操作,并且在数字值的情况下使用自定义 Transformer/customMapping 。
The documentation of the Concatenate function might help as well! Concatenate函数的文档也可能有所帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.