[英]Dynamic conditions generator in pyspark
I have developed modules according to the business requirements.我根据业务需求开发了模块。 Now, what I need is a dynamic condition generator or query generator.
现在,我需要的是一个动态条件生成器或查询生成器。 So, for example, consider the below case:
因此,例如,考虑以下情况:
B1 = spark.sql("select * from xyz where ABC <> DEF and CONDITIONS1 or CONDITIONS2 or CONDITIONS3")
I have many and different business logic as above.如上所述,我有许多不同的业务逻辑。 So, in this case, I identified a common pattern like
"select *"
and I created a property file with .properties
extension and I read that variable inside the .py
file:因此,在这种情况下,我确定了一个常见模式,例如
"select *"
,并创建了一个扩展名为.properties
的属性文件,并在.py
文件中读取了该变量:
Key-Value
pair in properties file:属性文件中的
Key-Value
对:
selectVar = "Select * from "
But, now what demand is needed to create a way or interface where user can modify the conditions as per their need.但是,现在需要什么需求来创建用户可以根据需要修改条件的方式或界面。 Like they can add multiple conditions or remove a condition.
就像他们可以添加多个条件或删除一个条件一样。 In the above example they can remove
CONDITIONS2
or add CONDITIONS4
or they can change the CONDITIONS3
as per their need.在上面的示例中,他们可以删除
CONDITIONS2
或添加CONDITIONS4
,或者他们可以根据需要更改CONDITIONS3
。 It should be dynamic.它应该是动态的。 No, coding part should be required on client-side.
不,客户端应该需要编码部分。 They just want to pass conditions and it should be substituted in the query and it should execute accordingly.
他们只想传递条件,它应该在查询中被替换,它应该相应地执行。 So, how can I do it in pyspark.
那么,我该如何在 pyspark 中做到这一点。 I tried searching for available tools in this case but, no luck.
在这种情况下,我尝试搜索可用的工具,但没有运气。 Can anyone help me with the approach?
任何人都可以帮我解决这个问题吗?
It is non-trivial to write a generic interface that parses every type of expression.编写一个解析每种类型表达式的通用接口并非易事。 However for the specific case of multiple filter expressions you can do something like this:
但是,对于多个过滤器表达式的特定情况,您可以执行以下操作:
def customExprEval(df: DataFrame, expr: String*): DataFrame = {
expr.foldLeft(df){(d, i) => d.where(i)}
}
You can now call this function with variable number of conditional expressions:您现在可以使用可变数量的条件表达式调用此 function:
val B1 = spark.sql("select * from xyz")
val B2 = customExprEval(B1, "ABC <> DEF", "CONDITIONS1 or CONDITIONS2 or CONDITIONS3")
val B3 = customExprEval(B1, "ABC <> DEF")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.