[英]How can I change a non numeric value in all the data set using Spark?
I'm using a data set with a lot of columns, this data set has ? 我正在使用具有很多列的数据集,该数据集具有吗? in all the data set. 在所有数据集中。 I would like to Spark (Java) to change the ? 我想用Spark(Java)来更改? to 0. By far I can only do this with one column but I would like to do everywhere: 到0。到目前为止,我只能用一列来完成此操作,但是我想在任何地方都做:
Dataset<Row> csvData = spark.read()
.option("header", false)
.option("inferSchema", true)
.option("maxColumns", 50000)
.csv("src/main/resources/K9.data");
csvData = csvData.withColumn("_c5409", when(col("_c5409").isNull(),0).otherwise(col("_c5409")) )
.withColumn("_c0", when(col("_c0").equalTo("?"),0).otherwise(col("_c0")) );
Maybe this has an easy solution, I'm new with Java and Spark :) 也许这有一个简单的解决方案,我是Java和Spark的新手:)
You can create the list of columns using when, and use that in select if it has to deal with complex if and else cases 您可以使用when来创建列列表,如果需要处理复杂的if和else情况,则可以在select中使用它
List<org.apache.spark.sql.Column> list = new ArrayList<org.apache.spark.sql.Column>();
for( String col : csvData.columns()){
list.add(when(csvData.col(col).isNull(),0).otherwise(csvData.col(col)).alias(col));
}
csvData = csvData.select(list.toArray(new org.apache.spark.sql.Column[0]));
If it is simply to replace nulls, this is good enough 如果只是替换空值,这已经足够了
csvData = csvData.na().fill(0, df.columns());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.