SparkConf conf = new SparkConf().setAppName("Text File Data Load").setMaster("local").set("spark.driver.host","localhost").set("spark.testing.memory", "2147480000");
SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
Dataset<Row> df = spark.read()
.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat")
.option("sep", ",")
.option("inferSchema", true)
.option("header", true)
.load("E:/CarsData.csv");
// df_accounts.printSchema();
// df_accounts.show(50);
// df_accounts.persist();
// df_accounts.cache();
// df_accounts.count();
// System.out.println("number of rows" + df_accounts.count());
// System.out.println("distinct values "+df_accounts.description.distinct().count());
// df.groupBy().agg(countDistinct("key", "value"));
// System.out.println("number of distinct "+df_accounts.select("description").distinct().count());
// df_accounts.col("description").show();
df.filter(df.where(df.col("mileageFromOdometer").equalTo("Automatic") df.where(df.col("description").equalTo("Honda")))).show();
// df.where(df.col("description").equalTo("Honda")).show();
You can use below simple code
df.filter(df.col("description").equalTo("Honda").and(df.col("modelDate").equalTo("used")));
OR
if you want to use spark sql you can do as below
df.createOrReplaceTempView("tempName")
sparkSession.sql("select * from tempName where descriptoin='Honda' and modelDate = 'used'");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.