[英]Spark Scala Dynamic column selection from DataFrame
I have a DataFrame which have different type of columns. 我有一个具有不同类型的列的DataFrame。 Among those column, i need to retrieve specific column from that DataFrame.
在那些列中,我需要从该DataFrame中检索特定的列。 Hard coded DataFrame select statement will be like this:
硬编码的DataFrame select语句将如下所示:
val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"),
col("FEATURE_COL1"), col("FEATURE_COL2"), col("FEATURE_COL3"), col("FEATURE_COL4"))
Where LEBEL_COLUMN and FEATURE_COLs will be dynamic. LEBEL_COLUMN和FEATURE_COL将是动态的。 I have Array or Seq for those FEATURE Columns like this:
我对这些FEATURE列使用Array或Seq:
val FEATURE_COL_ARR = Array("FEATURE_COL1","FEATURE_COL2","FEATURE_COL3","FEATURE_COL4")
I need to use this Array of column collection with that SELECT statement in the 2nd part. 我需要将此列集合的数组与第二部分的SELECT语句一起使用。 In the select, 1st column will be one (LABEL_COLUMN) and rest will be dynamic list.
在选择中,第一列将是一列(LABEL_COLUMN),其余列将是动态列表。
Can you please help me to make the select statement working in SCALA. 您能帮我使select语句在SCALA中工作吗?
Note: The sample code given bellow is working, but i need to add column array in the 2nd part of the SELECT 注意:下面给出的示例代码正在工作,但是我需要在SELECT的第二部分中添加列数组
val colNames = FEATURE_COL_ARR.map(name => col(name))
val logRegrDF = myDF.select(colNames:_*) // it is not the requirement
I am thinking for 2nd part code will be like this, but it is not working: 我在想第二部分代码将是这样,但它不起作用:
val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"), colNames:_*)
If I understand your question, I hope this is what you are looking for 如果我理解您的问题,希望这就是您要寻找的
val allColumnsArr = "LEBEL_COLUMN" +: FEATURE_COL_ARR
result.select("LEBEL_COLUMN", allColumnsArr: _*)
.withColumnRenamed("LEBEL_COLUMN", "label")
Hope this helps! 希望这可以帮助!
Thanks a lot @Shankar. 非常感谢@Shankar。
Though your given suggestion is not working, but i got an idea from your suggestion and solved the issue by this way 尽管您的建议没有用,但是我从您的建议中得到了一个主意,并以此方式解决了问题
val allColumnsArr = "LEBEL_COLUMN" +: FEATURE_COL_ARR
val colNames = allColumnsArr.map(name => col(name))
myDF.select(colNames:_*).withColumnRenamed("LEBEL_COLUMN", "label")
Also this way without creating DataFrame column: 同样,这种方式无需创建DataFrame列:
result.select(LEBEL_COLUMN, FEATURE_COL_ARR: _*) .withColumnRenamed(LEBEL_COLUMN, "label")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.