[英]ML.NET IDataView back to csv
Assume I have this sample data: 假设我有以下示例数据:
Sample.csv: Sample.csv:
Dog,25
Cat,23
Cat,20
Dog,0
And I want to load it to the IDataView
, the transform it to be ready for ML (without strings and so), then save it again as .csv
, say to analyze it with another tool or languages. 我想将其加载到
IDataView
,进行转换以准备用于ML(没有字符串等),然后再次将其另存为.csv
,例如要使用其他工具或语言进行分析。
// Load data:
var sampleCsv = Path.Combine("Data", "Sample.csv");
var columns = new[]
{
new TextLoader.Column("type", DataKind.String, 0),
new TextLoader.Column("age", DataKind.Int16, 1),
};
var mlContext = new MLContext(seed: 0);
var dataView = mlContext.Data.LoadFromTextFile(sampleCsv, columns,',');
// Transform
var pipeline =
mlContext.Transforms.Categorical.OneHotEncoding("type",
// This outputKind will add just one column, while others will add some:
outputKind: OneHotEncodingEstimator.OutputKind.Key);
var transformedDataView = pipeline.Fit(dataView).Transform(dataView);
// transformedDataView:
// Dog,1,25
// Cat,2,23
// Cat,2,20
// Dog,1,0
How to get the two numbers columns and write them to the .csv
file? 如何获取两个数字列并将其写入
.csv
文件?
You can create a class
for your output data: 您可以为输出数据创建一个
class
:
class TempOutput
{
// Note that the types should be the same from the DataView
public UInt32 type { get; set; }
public Int16 age { get; set; }
}
Then use CreateEnumerable<>
to read all rows from the DataView
and print them to `.csv. 然后使用
CreateEnumerable<>
从DataView
读取所有行并将它们打印到`.csv。 file: 文件:
File.WriteAllLines(sampleCsv + ".output",
mlContext.Data.CreateEnumerable<TempOutput>(transformedDataView, false)
.Select(t => string.Join(',', t.type, t.age)));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.