简体   繁体   中英

how I can filter multiple columns in spark using java?

        SparkConf conf = new SparkConf().setAppName("Text File Data Load").setMaster("local").set("spark.driver.host","localhost").set("spark.testing.memory", "2147480000");       
    SparkSession spark = SparkSession.builder().config(conf).getOrCreate(); 
    Dataset<Row> df = spark.read()
                .format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat")
                .option("sep", ",")
                .option("inferSchema", true)
                .option("header", true)
                .load("E:/CarsData.csv");
//   df_accounts.printSchema();
//   df_accounts.show(50);
//   df_accounts.persist();
//   df_accounts.cache(); 
//   df_accounts.count();
//  System.out.println("number of rows" + df_accounts.count());
//  System.out.println("distinct values "+df_accounts.description.distinct().count());
//  df.groupBy().agg(countDistinct("key", "value"));
//  System.out.println("number of distinct "+df_accounts.select("description").distinct().count());
//   df_accounts.col("description").show();
    df.filter(df.where(df.col("mileageFromOdometer").equalTo("Automatic")   df.where(df.col("description").equalTo("Honda")))).show();
//  df.where(df.col("description").equalTo("Honda")).show();
    

here is my Dataframe , I am trying to filter out only Honda Cars and in modelDate I want to extract the used cars

You can use below simple code

df.filter(df.col("description").equalTo("Honda").and(df.col("modelDate").equalTo("used")));

OR

if you want to use spark sql you can do as below

df.createOrReplaceTempView("tempName")
sparkSession.sql("select * from tempName where descriptoin='Honda' and modelDate = 'used'");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM