I have a scenario where I need to dynamically join two DataFrames. I am creating a helper function and passing DataFrames as input parameters like this.
def joinDataFrame(first_df, second_df, first_cols, second_cols,join_type) -> DataFrame:
return_df = first_df.join(second_df, (col(f) == col(s) for (f,s) in zip(first_cols, second_cols), join_type)
return return_df
This works fine if I only have 'and' scenarios, but I have requirements to pass 'or' conditions as well.
I did try to build a string containing the condition and then using expr()
I can pass the join condition but I am getting 'ParseException'
.
I would prefer to build the 'join' condition and pass it as a parameter to this function.
Reduce using |
on zipped equality conditions:
from functools import reduce
join_cond = reduce(
lambda x, y: x | y,
(first_df[f] == second_df[s] for (f,s) in zip(first_cols, second_cols))
)
return_df = first_df.join(second_df, join_cond, join_type)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.