简体   繁体   English

如何使用Spark更改所有数据集中的非数值?

[英]How can I change a non numeric value in all the data set using Spark?

I'm using a data set with a lot of columns, this data set has ? 我正在使用具有很多列的数据集,该数据集具有吗? in all the data set. 在所有数据集中。 I would like to Spark (Java) to change the ? 我想用Spark(Java)来更改? to 0. By far I can only do this with one column but I would like to do everywhere: 到0。到目前为止,我只能用一列来完成此操作,但是我想在任何地方都做:

    Dataset<Row> csvData = spark.read()
            .option("header", false)
            .option("inferSchema", true)
            .option("maxColumns", 50000)
            .csv("src/main/resources/K9.data");

    csvData = csvData.withColumn("_c5409", when(col("_c5409").isNull(),0).otherwise(col("_c5409")) )
        .withColumn("_c0", when(col("_c0").equalTo("?"),0).otherwise(col("_c0")) );

Maybe this has an easy solution, I'm new with Java and Spark :) 也许这有一个简单的解决方案,我是Java和Spark的新手:)

You can create the list of columns using when, and use that in select if it has to deal with complex if and else cases 您可以使用when来创建列列表,如果需要处理复杂的if和else情况,则可以在select中使用它

List<org.apache.spark.sql.Column> list = new ArrayList<org.apache.spark.sql.Column>();
for( String col : csvData.columns()){
    list.add(when(csvData.col(col).isNull(),0).otherwise(csvData.col(col)).alias(col));
}
csvData = csvData.select(list.toArray(new org.apache.spark.sql.Column[0]));

If it is simply to replace nulls, this is good enough 如果只是替换空值,这已经足够了

csvData = csvData.na().fill(0, df.columns()); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用定界符删除单词首尾的所有非字母数字字符? - How can I remove all non alpha numeric characters in head and tail of words using the delimiter? 如何从数组中删除所有非数字元素? - How can I remove all non-numeric elements from an array? 如何从 arraylist 中删除所有非数字元素? - How can I remove all non-numeric elements from an arraylist? 如何使用非固定值设置消息处理器间隔参数? (使用存储在property \ registry \ etc中的值) - How can I set the message processor interval parameter using a non fixed value? (using a value stored in a property\registry\etc) 如何使用Spark数据帧将csv数据加载到配置单元中? - How I can load csv data into hive using Spark dataframes? 我如何使用 java spark 转置 csv 数据 - How can i transpose csv data using java spark 如何使用JGAP设置不重复等位基因的染色体? - How can I set a chromosome with non repeated alleles using JGAP? 如何使用单例模式计数器为图像上传设置自动数字名称生成器 - How can I set an auto numeric name generator for image uploads, using a singleton pattern counter 如何使用Apache POI在具有公式的单元格上设置数值? - How to set Numeric value on cell that has a formula using Apache POI? 如果所有参数都是私有的,如何在构造函数中设置数据? - How can I set data in a Constructor if all parameters are private?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM