I am using Spark 2.2.0 and Scala 2.11.8 in the Spark-Shell environment. I have a data frame df, and I need to filter out the previous day's data based on the value of column 'date' and then append the data to a HDFS location. (eg today is 2018-06-28, I need the data of 2018-06-27)
Below is the code:
df.filter($"date" === "2018-06-27") .write.mode(SaveMode.Append).parquet("hdfs:/path..../date=2018-06-27")
I need the code above for automation, so I need to replace "2018-06-27" for the filter value as well as the directory name. So if I have a string -> date_test: String = 2018-06-27; The code below should be still working
df.filter($"date" === "date_test") .write.mode(SaveMode.Append).parquet("hdfs:/path..../date=date_test")
How to do this?
You can apply filter conditions like below
//Input
+----------+
| date|
+----------+
|2018-02-01|
|2017-01-02|
+----------+
//Solution:
val previousDate="'2018-02-01'"
df.filter(s"date=$previousDate").show
//Output:
+----------+
| date|
+----------+
|2018-02-01|
+----------+
You can do like this for your solution
val datetest:String="2018-02-01"
df.filter(s"date='$datetest'").write.mode(SaveMode.Append).parquet(s"hdfs:/path..../$datetest")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.