简体   繁体   English

从DataFrame选择Spark Scala动态列

[英]Spark Scala Dynamic column selection from DataFrame

I have a DataFrame which have different type of columns. 我有一个具有不同类型的列的DataFrame。 Among those column, i need to retrieve specific column from that DataFrame. 在那些列中,我需要从该DataFrame中检索特定的列。 Hard coded DataFrame select statement will be like this: 硬编码的DataFrame select语句将如下所示:

val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"),
col("FEATURE_COL1"), col("FEATURE_COL2"), col("FEATURE_COL3"), col("FEATURE_COL4"))

Where LEBEL_COLUMN and FEATURE_COLs will be dynamic. LEBEL_COLUMN和FEATURE_COL将是动态的。 I have Array or Seq for those FEATURE Columns like this: 我对这些FEATURE列使用Array或Seq:

val FEATURE_COL_ARR = Array("FEATURE_COL1","FEATURE_COL2","FEATURE_COL3","FEATURE_COL4")

I need to use this Array of column collection with that SELECT statement in the 2nd part. 我需要将此列集合的数组与第二部分的SELECT语句一起使用。 In the select, 1st column will be one (LABEL_COLUMN) and rest will be dynamic list. 在选择中,第一列将是一列(LABEL_COLUMN),其余列将是动态列表。

Can you please help me to make the select statement working in SCALA. 您能帮我使select语句在SCALA中工作吗?

Note: The sample code given bellow is working, but i need to add column array in the 2nd part of the SELECT 注意:下面给出的示例代码正在工作,但是我需要在SELECT的第二部分中添加列数组

val colNames = FEATURE_COL_ARR.map(name => col(name))
val logRegrDF = myDF.select(colNames:_*)  // it is not the requirement

I am thinking for 2nd part code will be like this, but it is not working: 我在想第二部分代码将是这样,但它不起作用:

val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"), colNames:_*)

If I understand your question, I hope this is what you are looking for 如果我理解您的问题,希望这就是您要寻找的

val allColumnsArr = "LEBEL_COLUMN" +: FEATURE_COL_ARR
result.select("LEBEL_COLUMN", allColumnsArr: _*)
  .withColumnRenamed("LEBEL_COLUMN", "label")

Hope this helps! 希望这可以帮助!

Thanks a lot @Shankar. 非常感谢@Shankar。

Though your given suggestion is not working, but i got an idea from your suggestion and solved the issue by this way 尽管您的建议没有用,但是我从您的建议中得到了一个主意,并以此方式解决了问题

val allColumnsArr = "LEBEL_COLUMN" +: FEATURE_COL_ARR
val colNames = allColumnsArr.map(name => col(name)) 
myDF.select(colNames:_*).withColumnRenamed("LEBEL_COLUMN", "label")

Also this way without creating DataFrame column: 同样,这种方式无需创建DataFrame列:

result.select(LEBEL_COLUMN, FEATURE_COL_ARR: _*) .withColumnRenamed(LEBEL_COLUMN, "label") 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM