简体   繁体   English

ML.NET IDataView回到CSV

[英]ML.NET IDataView back to csv

Assume I have this sample data: 假设我有以下示例数据:

Sample.csv: Sample.csv:

Dog,25
Cat,23
Cat,20
Dog,0

And I want to load it to the IDataView , the transform it to be ready for ML (without strings and so), then save it again as .csv , say to analyze it with another tool or languages. 我想将其加载到IDataView ,进行转换以准备用于ML(没有字符串等),然后再次将其另存为.csv ,例如要使用其他工具或语言进行分析。

// Load data:
var sampleCsv = Path.Combine("Data", "Sample.csv");
var columns = new[]
{
    new TextLoader.Column("type", DataKind.String, 0),
    new TextLoader.Column("age", DataKind.Int16, 1),
};
var mlContext = new MLContext(seed: 0);
var dataView = mlContext.Data.LoadFromTextFile(sampleCsv, columns,',');

// Transform
var pipeline =
    mlContext.Transforms.Categorical.OneHotEncoding("type",
        // This outputKind will add just one column, while others will add some:
        outputKind: OneHotEncodingEstimator.OutputKind.Key);
var transformedDataView = pipeline.Fit(dataView).Transform(dataView);
//  transformedDataView:
//  Dog,1,25
//  Cat,2,23
//  Cat,2,20
//  Dog,1,0

How to get the two numbers columns and write them to the .csv file? 如何获取两个数字列并将其写入.csv文件?

You can create a class for your output data: 您可以为输出数据创建一个class

class TempOutput
{
    // Note that the types should be the same from the DataView
    public UInt32 type { get; set; }
    public Int16 age { get; set; }
}

Then use CreateEnumerable<> to read all rows from the DataView and print them to `.csv. 然后使用CreateEnumerable<>DataView读取所有行并将它们打印到`.csv。 file: 文件:

File.WriteAllLines(sampleCsv + ".output",
    mlContext.Data.CreateEnumerable<TempOutput>(transformedDataView, false)
    .Select(t => string.Join(',', t.type, t.age)));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM