简体   繁体   中英

how to concat all columns in a spark dataframe, using java?

This is how I do do for 2 specific columns:

dataSet.withColumn("colName", concat(dataSet.col("col1"), lit(","),dataSet.col("col2") ));

but dataSet.columns() retruns Sting array, and not Column array. How should I craete a List<Column> ?

Thanks!

Simple Way - Instead of df.columns use concat_ws(",","*") , Check below code.

df.withColumn("colName",expr("concat_ws(',',*)")).show(false)
+---+--------+---+-------------+
|id |name    |age|colName      |
+---+--------+---+-------------+
|1  |Srinivas|29 |1,Srinivas,29|
|2  |Ravi    |30 |2,Ravi,30    |
+---+--------+---+-------------+

This is how I do do for 2 specific columns:

dataSet.withColumn("colName", concat(dataSet.col("col1"), lit(","),dataSet.col("col2") ));

but dataSet.columns() retruns Sting array, and not Column array. How should I craete a List<Column> ?

Thanks!

Java has more verbose syntax. Try this -

 df.withColumn("colName",concat_ws(",", toScalaSeq(Arrays.stream(df.columns()).map(functions::col).collect(Collectors.toList()))));

Use below utility to convert java list to scala seq-

  <T> Buffer<T> toScalaSeq(List<T> list) {
        return JavaConversions.asScalaBuffer(list);
    }

If someone is looking for a way to concat all the columns of a DataFrame in Scala, this is what worked for me:

val df_new = df.withColumn(new_column_name, concat_ws("-", df.columns.map(col): _*))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM