简体   繁体   中英

Concatenating Dataset columns in Apache Spark using Java by passing an array with column names as array elements?

Here is what I want to achieve using Java and Spark.

I have an array of column names as below.

String[] col_arr = new String[] { "colname_1", "colname_2"};

I want to concat the 2 columns by passing the array (with column names as array elememts) in the concat function.

Dataset<Row> new_abc = dataset_abc.withColumn("new_concat_Column", concat(col_arr));

The below code is working but I do not want to pass the column names explicitly, instead I want to pass the array contaning the column names as array elements.

Dataset<Row> new_abc = dataset_abc.withColumn("new_concat_Column", concat(col("colname_1"), col("colname_2")));

You can pass an array of columns, Column[] to the concat function like so:

Column[] columnArray = {
    col("column1"), col("column2") 
    };
Dataset<Row> concatenatedDS = dataset.withColumn("concatenated_column", concat(columnArray));

If you only have a String[] array, you can build a Column[] array with it dynamically.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM