Say I have a Dataset:
Dataset<Row> sqlDF = this.spark.sql("SELECT first_name, last_name, age from persons";
this will return a Dataset
with three columns: first_name, last_name, age.
I want to apply a function that adds 5 to the age
column and returns a new Dataset with the same columns as the original Dataset but with the age value changed:
public int add_age(int old_age){
return old_age + 5;
}
How do I go about doing this with Apache Spark on Java?
I solved this by making a StructType and adding the three columns to it, then mapping each to the new constructed row and applying the function to the line column age
using RowFactory
:
StructType customStructType = new StructType();
customStructType = customStructType.add("first_name", DataTypes.StringType, true);
customStructType = customStructType.add("last_name", DataTypes.StringType, true);
customStructType = customStructType.add("age", DataTypes.IntegerType, true);
ExpressionEncoder<Row> customTypeEncoder = null;
Dataset<Row> changed_data = sqlDF.map(row->{
return RowFactory.create(row.get(0),row.get(1), add_age(row.get(2)));
}, RowEncoder.apply(customStructType));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.