[英]Spark Java edit data in column
I would like to iterate through the content of a column in a spark DataFrame
and correct the data within a cell if it meets a certain condition 我想遍历spark DataFrame
中一列的内容,并在满足特定条件的情况下更正单元格中的数据
+-------------+
|column_title |
+-------------+
+-----+
|null |
+-----+
+-----+
|0 |
+-----+
+-----+
|1 |
+-----+
Lets say I want to display something else when value of column is null, I tried with 假设我想在column值为null时显示其他内容,
Column.when()
DataSet.withColumn()
Column.when()
DataSet.withColumn()
But I cant find the right method, i don't think it would be necessary to convert to RDD and iterate through it. 但是我找不到正确的方法,我认为没有必要转换为RDD并对其进行迭代。
You can use when
and equalTo
or when
and isNull
. 您可以使用when
和equalTo
或when
和isNull
。
Dataset<Row> df1 = df.withColumn("value", when(col("value").equalTo("bbb"), "ccc").otherwise(col("value")));
Dataset<Row> df2 = df.withColumn("value", when(col("value").isNull(), "ccc").otherwise(col("value")));
If you only want to replace null values then you can also use na
and fill
. 如果只想替换空值,则还可以使用na
和fill
。
Dataset<Row> df3 = df.na().fill("ccc");
Another way of doing this could be by using UDF. 完成此操作的另一种方法是使用UDF。
Create a UDF. 创建一个UDF。
private static UDF1 myUdf = new UDF1<String, String>() {
public String call(final String str) throws Exception {
// any condition or custom function can be used
return StringUtils.rightPad(str, 25, 'A');
}
};
Register UDF in SparkSession. 在SparkSession中注册UDF。
sparkSession.udf().register("myUdf", myUdf, DataTypes.StringType);
Apply udf on dataset. 将udf应用于数据集。
Dataset<Row> dataset = dataset.withColumn("city", functions.callUDF("myudf", col("city")));
Hope it helps ! 希望能帮助到你 !
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.