简体   繁体   English

pyspark 中的动态条件生成器

[英]Dynamic conditions generator in pyspark

I have developed modules according to the business requirements.我根据业务需求开发了模块。 Now, what I need is a dynamic condition generator or query generator.现在,我需要的是一个动态条件生成器或查询生成器。 So, for example, consider the below case:因此,例如,考虑以下情况:

B1 = spark.sql("select * from xyz where ABC <> DEF and CONDITIONS1 or CONDITIONS2 or CONDITIONS3")  

I have many and different business logic as above.如上所述,我有许多不同的业务逻辑。 So, in this case, I identified a common pattern like "select *" and I created a property file with .properties extension and I read that variable inside the .py file:因此,在这种情况下,我确定了一个常见模式,例如"select *" ,并创建了一个扩展名为.properties的属性文件,并在.py文件中读取了该变量:

Key-Value pair in properties file:属性文件中的Key-Value对:

selectVar = "Select * from "   

But, now what demand is needed to create a way or interface where user can modify the conditions as per their need.但是,现在需要什么需求来创建用户可以根据需要修改条件的方式或界面。 Like they can add multiple conditions or remove a condition.就像他们可以添加多个条件或删除一个条件一样。 In the above example they can remove CONDITIONS2 or add CONDITIONS4 or they can change the CONDITIONS3 as per their need.在上面的示例中,他们可以删除CONDITIONS2或添加CONDITIONS4 ,或者他们可以根据需要更改CONDITIONS3 It should be dynamic.它应该是动态的。 No, coding part should be required on client-side.不,客户端应该需要编码部分。 They just want to pass conditions and it should be substituted in the query and it should execute accordingly.他们只想传递条件,它应该在查询中被替换,它应该相应地执行。 So, how can I do it in pyspark.那么,我该如何在 pyspark 中做到这一点。 I tried searching for available tools in this case but, no luck.在这种情况下,我尝试搜索可用的工具,但没有运气。 Can anyone help me with the approach?任何人都可以帮我解决这个问题吗?

It is non-trivial to write a generic interface that parses every type of expression.编写一个解析每种类型表达式的通用接口并非易事。 However for the specific case of multiple filter expressions you can do something like this:但是,对于多个过滤器表达式的特定情况,您可以执行以下操作:

def customExprEval(df: DataFrame, expr: String*): DataFrame = {
    expr.foldLeft(df){(d, i) => d.where(i)}
}

You can now call this function with variable number of conditional expressions:您现在可以使用可变数量的条件表达式调用此 function:

val B1 = spark.sql("select * from xyz")
val B2 = customExprEval(B1,  "ABC <> DEF", "CONDITIONS1 or CONDITIONS2 or CONDITIONS3")  
val B3 = customExprEval(B1, "ABC <> DEF")  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM