繁体   English   中英

如何使用关系运算符作为变量过滤 Spark scala 中的数据帧?

[英]How to filter a dataframe in Spark scala with relational operators as variables?

我有一个如下的数据框:

myDF:

+-----+
|value|
+-----+
|8    |
|8    |
|1    |
+-----+

该程序从其他计算数据帧读取并获得以下两个值:

val attr = 5
val opr = >

现在我需要根据值过滤myDF 所以我的结果将如下所示:

resultDF:
+-----+----------+
|value|result    |
+-----+----------+
|8    |GOOD      |
|8    |GOOD      |
|1    |BAD       |
+-----+----------+

我使用的代码:

val resultDF = myDF.withColumn("result", when(col("value") > attr, "GOOD").otherwise("BAD"))

现在, attr 和 opr 将动态更改。 这意味着运算符可以是>, <, >=, <=, <>

根据我收到的运营商,我的过滤条件应该改变。 就像我需要为运算符使用变量一样。

有人可以建议吗?

val resultDF = myDF.withColumn("result", when(col("value") opr attr, "GOOD").otherwise("BAD"))

首先,正如@Andrew所说,由于未定义的行为和调试困难,在没有大理由的情况下使用动态 sql 是个坏主意。 假设您已将值与运算符数据框连接起来,那么您可以使用以下代码:

import spark.implicits._

val appData: DataFrame = Seq(
  ("1", ">"),
  ("1", ">"),
  ("3", "<="),
  ("4", "<>"),
  ("6", ">="),
  ("6", "==")
).toDF("value", "operator")

val attr = 5

def compare(value: String, operator: String, sample: Int): String = {
  val isValueCorrectForAttr: Boolean = operator match {
    case ">" => value.toInt > sample
    case "<" => value.toInt < sample
    case ">=" => value.toInt >= sample
    case "<=" => value.toInt <= sample
    case "==" => value.toInt == sample
    case "<>" => value.toInt != sample
    case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
  }
  if (isValueCorrectForAttr) "GOOD" else "BAD"
}

import org.apache.spark.sql.functions._
val dynamic_compare =  spark.udf.register("dynamic_compare", (v: String, op: String) => compare(v, op, attr))
appData.withColumn("result", dynamic_compare(col("value"), col("operator")))

如果您没有运算符列,而只有单个运算符,则可以更简单:

import spark.implicits._

val appData: DataFrame = Seq(
  "1",
  "1",
  "3",
  "4",
  "6",
  "6"
).toDF("value")

val attr = 5
val op = ">"

def compare(value: String, operator: String, sample: Int): String = {
  val isValueCorrectForAttr: Boolean = operator match {
    case ">" => value.toInt > sample
    case "<" => value.toInt < sample
    case ">=" => value.toInt >= sample
    case "<=" => value.toInt <= sample
    case "==" => value.toInt == sample
    case "<>" => value.toInt != sample
    case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
  }
  if (isValueCorrectForAttr) "GOOD" else "BAD"
}

import org.apache.spark.sql.functions._
val dynamic_compare =  spark.udf.register("dynamic_compare", (value: String) => compare(value, op, attr))
appData.withColumn("result", dynamic_compare(col("value")))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM