I have a way to get a subset of a dataframe which works:
This works
val subset_cols = {joinCols :+ col}
val df1_subset = df1.select(subset_cols.head, subset_cols.tail: _*)
This doesn't work. The code compiles but I get a run time error.
val subset_cols = {joinCols :+ col}
val df1_subset = df1.select(subset_cols.deep.mkString(","))
Error:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '`first_name,last_name,rank_dr`' given input columns:
[model, first_name, service_date, rank_dr, id, purchase_date,
dealer_id, purchase_price, age, loyalty_score, vin_num, last_name, color];;
'Project ['first_name,last_name,rank_dr]
I'm trying to pass the subset_cols to the .select method but it seems I'm missing some kind of formatting.
what you do is :
df1.select("first_name,last_name,rank_dr")
Spark try to find a column named "first_name,last_name,rank_dr"
which does not exist
try :
val df1_subset = df1.selectExpr(subset_cols: _*)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.