I have two dataframes: left_df and right_df that have common columns to join on: ['col_1, 'col_2']
, and I want to join onto another condition: right_df.col_3.between(left_df.col_4, left_df.col_5)]
Code:
from pyspark.sql import functions as F
join_condition = ['col_1',
'col_2',
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
df.write.parquet('/tmp/my_df')
But I got the error below:
TypeError: Column is not iterable
Why I can't add those 3 conditions together?
You cannot mix strings with Columns. The expressions must be a list of strings or a list of Columns, not a mixture of both. You can convert the first two items to a column expression instead, eg
from pyspark.sql import functions as F
join_condition = [left_df.col_1 == right_df.col_1,
left_df.col_2 == right_df.col_2,
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.