[英]How to filter a dataframe in Spark scala with relational operators as variables?
I have a dataframe as below:我有一个如下的数据框:
myDF:
+-----+
|value|
+-----+
|8 |
|8 |
|1 |
+-----+
The program reads from other computed dataframe and get the below two values:该程序从其他计算数据帧读取并获得以下两个值:
val attr = 5
val opr = >
Now i need to filter myDF
based on the values.现在我需要根据值过滤myDF
。 So my result will be like below:所以我的结果将如下所示:
resultDF:
+-----+----------+
|value|result |
+-----+----------+
|8 |GOOD |
|8 |GOOD |
|1 |BAD |
+-----+----------+
Code I used:我使用的代码:
val resultDF = myDF.withColumn("result", when(col("value") > attr, "GOOD").otherwise("BAD"))
Now, the attr and opr will change dynamically.现在, attr 和 opr 将动态更改。 Meaning the operator can be any of >, <, >=, <=, <>
.这意味着运算符可以是>, <, >=, <=, <>
。
Based on the operator I receive my filter condition should change.根据我收到的运营商,我的过滤条件应该改变。 Like I need to use the variable for the operator.就像我需要为运算符使用变量一样。
Can someone please advise ?有人可以建议吗?
val resultDF = myDF.withColumn("result", when(col("value") opr attr, "GOOD").otherwise("BAD"))
Firstly, as @ Andrew said, it's bad idea to use dynamic sql without a big reason, because of undefined behavior, and difficulties in debugging.首先,正如@Andrew所说,由于未定义的行为和调试困难,在没有大理由的情况下使用动态 sql 是个坏主意。 Assume you have joined values with operators dataframe, then you can use this code:假设您已将值与运算符数据框连接起来,那么您可以使用以下代码:
import spark.implicits._
val appData: DataFrame = Seq(
("1", ">"),
("1", ">"),
("3", "<="),
("4", "<>"),
("6", ">="),
("6", "==")
).toDF("value", "operator")
val attr = 5
def compare(value: String, operator: String, sample: Int): String = {
val isValueCorrectForAttr: Boolean = operator match {
case ">" => value.toInt > sample
case "<" => value.toInt < sample
case ">=" => value.toInt >= sample
case "<=" => value.toInt <= sample
case "==" => value.toInt == sample
case "<>" => value.toInt != sample
case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
}
if (isValueCorrectForAttr) "GOOD" else "BAD"
}
import org.apache.spark.sql.functions._
val dynamic_compare = spark.udf.register("dynamic_compare", (v: String, op: String) => compare(v, op, attr))
appData.withColumn("result", dynamic_compare(col("value"), col("operator")))
if you don't have operator column, and just single operator, it can be more simple:如果您没有运算符列,而只有单个运算符,则可以更简单:
import spark.implicits._
val appData: DataFrame = Seq(
"1",
"1",
"3",
"4",
"6",
"6"
).toDF("value")
val attr = 5
val op = ">"
def compare(value: String, operator: String, sample: Int): String = {
val isValueCorrectForAttr: Boolean = operator match {
case ">" => value.toInt > sample
case "<" => value.toInt < sample
case ">=" => value.toInt >= sample
case "<=" => value.toInt <= sample
case "==" => value.toInt == sample
case "<>" => value.toInt != sample
case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
}
if (isValueCorrectForAttr) "GOOD" else "BAD"
}
import org.apache.spark.sql.functions._
val dynamic_compare = spark.udf.register("dynamic_compare", (value: String) => compare(value, op, attr))
appData.withColumn("result", dynamic_compare(col("value")))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.