简体   繁体   English

微笑 scala api:从数组创建数据帧

[英]smile scala api: create Dataframe from Array

I am trying to integrate smile in my scala code base.我正在尝试将微笑集成到我的 Scala 代码库中。 In particular, I would like to train a Random Forest Classifier.特别是,我想训练一个随机森林分类器。 In the FAQ it is written:常见问题解答中写道:

Most Smile algorithms take simple double[] as input.大多数 Smile 算法采用简单的 double[] 作为输入。 So you can use your favorite methods or library to import the data as long as the samples are in double arrays.因此,只要样本在双数组中,您就可以使用您喜欢的方法或库来导入数据。

But it does not seem to be the case for the RandomForest, all fit methods seem to take a Formula and a Dataframe as input.但对于 RandomForest 似乎并非如此,所有拟合方法似乎都将一个公式和一个数据框作为输入。 in my case I have two Array[Array[Double]] containing examples of two different classes: the first should be labelled as 0 and the second as 1 for example.就我而言,我有两个 Array[Array[Double]] 包含两个不同类的示例:例如,第一个应标记为 0,第二个应标记为 1。 The first array has shape (n_samples_0, n_features) and the second (n_samples_1, n_features)第一个数组具有形状 (n_samples_0, n_features) 和第二个 (n_samples_1, n_features)

To the best of my knowledge, the only way to train a smile randomForest on this data is to first convert these two arrays to one smile dataframe with n_features + 1 columns (one for each feature + one for the label) and n_samples_0 + n_samples_1 rows.据我所知,在此数据上训练微笑 randomForest 的唯一方法是首先将这两个数组转换为一个微笑数据框,其中包含 n_features + 1 列(每个特征一列+标签一列)和 n_samples_0 + n_samples_1 行. And then:进而:

val formula: Formula = "class" ~
val rf = randomForest(formula, df)

Hence my question: is there a way to create a Dataframe from an array in the Scala API?因此我的问题是:有没有办法从 Scala API 中的数组创建数据帧? I can only find ways to create Dataframe by reading different file formats.我只能通过读取不同的文件格式来找到创建 Dataframe 的方法。

I managed to solve my issue by using the of method of Smile DataFrames.我设法通过使用 Smile DataFrames 方法解决了我的问题。

Here is a minimal example: (X1 and X0 are arrays of arrays of doubles containing the features, each subarray is of size 600, X1 contains features of examples of the positive class, X0 contains features of examples of the negative class)这是一个最小的例子:(X1 和 X0 是包含特征的双精度数组,每个子数组的大小为 600,X1 包含正类示例的特征,X0 包含负类示例的特征)

val X1: List[Array[Double]] = ???
val X0: List[Array[Double]] = ???
val y1 = X1.map(_ => Array(1))
val y0 = X0.map(_ => Array(0))
val X = (X1 ++ X0).toArray
val y = (y1 ++ y0).toArray
val dfX = DataFrame.of(X)
val dfy = DataFrame.of(y, "class")
val df = dfX.merge(dfy)
val formula: Formula = "class" ~
val rf = randomForest(formula, df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM