简体   繁体   English

Scala-如何将字符串值传递到数据帧过滤器(Spark-Shell)

[英]Scala - How to pass a string value to a data frame filter (Spark-Shell)

I am using Spark 2.2.0 and Scala 2.11.8 in the Spark-Shell environment. 我在Spark-Shell环境中使用Spark 2.2.0和Scala 2.11.8。 I have a data frame df, and I need to filter out the previous day's data based on the value of column 'date' and then append the data to a HDFS location. 我有一个数据帧df,我需要根据“日期”列的值过滤掉前一天的数据,然后将数据附加到HDFS位置。 (eg today is 2018-06-28, I need the data of 2018-06-27) (例如今天是2018-06-28,我需要2018-06-27的数据)

Below is the code: 下面是代码:

 df.filter($"date" === "2018-06-27") .write.mode(SaveMode.Append).parquet("hdfs:/path..../date=2018-06-27")

I need the code above for automation, so I need to replace "2018-06-27" for the filter value as well as the directory name. 我需要上面的代码来实现自动化,因此我需要将“ 2018-06-27”替换为过滤器值以及目录名称。 So if I have a string -> date_test: String = 2018-06-27; 因此,如果我有一个字符串-> date_test:字符串= 2018-06-27; The code below should be still working 下面的代码应该仍然有效

 df.filter($"date" === "date_test") .write.mode(SaveMode.Append).parquet("hdfs:/path..../date=date_test")

How to do this? 这个怎么做?

You can apply filter conditions like below 您可以像下面这样应用过滤条件

//Input
+----------+
|      date|
+----------+
|2018-02-01|
|2017-01-02|
+----------+

//Solution: 
 val previousDate="'2018-02-01'"
 df.filter(s"date=$previousDate").show

//Output: 
+----------+
|      date|
+----------+
|2018-02-01|
+----------+

You can do like this for your solution 您可以这样解决您的问题

 val datetest:String="2018-02-01"
 df.filter(s"date='$datetest'").write.mode(SaveMode.Append).parquet(s"hdfs:/path..../$datetest")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM