简体   繁体   English

如何避免在Apache Spark数据框中的列选择中进行硬编码 斯卡拉

[英]How to avoid hardcoding in column selection in data frame in apache spark | Scala

I have the following data frame and I need to run logistic regression using spark ml on it: 我有以下数据框,并且需要在其上使用spark ml运行logistic回归:

uid  a  b  c  label d
1    0  1  3  0     2
2    3  0  0  1     0

While using the the ml package, i came to know that I need to create the data in the format 在使用ml软件包时,我知道我需要以以下格式创建数据

label  feature
0      [0,1,3,2]
1      [3,0,0,0]

Now i came across VectorAssembler to create the feature column and while doing so I need to do something like 现在,我遇到了VectorAssembler来创建功能列,同时我需要做类似的事情

val assembler = new VectorAssembler()
.setInputCols(Array("a", "b", "c", "d"))
.setOutputCol("features")

Is there anyway i can avoid the hardcoding of individual feature column names 无论如何,我可以避免对单个功能列名称进行硬编码

Depends on your data. 取决于您的数据。 If you know that you will always have a certain set of columns that is not a part for your feature vector (uid and label) AND can assume that all other columns are, you can do like this: 如果您知道总会有某些列而不是特征向量(uid和label)的一部分,并且可以假设所有其他列都是,则可以这样:

// df is your data frame
val assembler = new VectorAssembler()
.setInputCols(df.columns.diff(Array("uid","label")))
.setOutputCol("features")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何扁平化Apache Spark中的数据框 斯卡拉 - How to flatten a data frame in apache spark | Scala Apache scala spark中数据帧列中稀疏向量的大小 - Size of the sparse vector in the column of a data-frame in Apache scala spark 如何在 Scala 中不硬编码列名的情况下对 Spark DataFrame 进行反透视? - How to unpivot Spark DataFrame without hardcoding column names in Scala? 在Spark 2.2.0和Scala中对数据框的列求和 - Sum the column of a data frame in Spark 2.2.0 and Scala 在 Apache Spark 1.3 中将一列附加到数据帧 - Append a column to Data Frame in Apache Spark 1.3 如何使用 scala 中的火花流将索引列 append 到火花数据帧? - How to append an index column to a spark data frame using spark streaming in scala? Apache Spark:如何将带有正则表达式的数据框列转换为另一个数据框? - Apache Spark: how to transform Data Frame column with regex to another Data Frame? 在火花数据框中,如何使用 scala 将字符串类型的日期列转换为日期类型的日期列 - In spark Data frame how to convert Date column of type string to Date column of type Date using scala 如何避免像'sum(<column> )' 在 Spark/Scala 中的聚合?</column> - How to avoid column names like 'sum(<column>)' in aggregation in Spark/Scala? 在Scala中将Spark数据框列与其行连接起来 - Concatenate spark data frame column with its rows in Scala
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM