How to Join Pyspark Dataframe that is In Between 2 Columns of another Dataframe?

Question

I have 2 dataframes, one of them consist of 1 column of integers and the 2nd dataframe consist of 3 columns (integer_start, integer_end, animal).

dataframes and their columns

dataframe1 -> integer

dataframe2 -> integer_start, integer_end, animal

So what i want to do is to join these 2 dataframes such that if

dataframe1.integer is in between dataframe2.integer_start and dataframe2.integer_end

take out dataframe1.integer and the respective dataframe2.animal and put into a new dataframe called dataframe3

Hope you can help me with this. I am using PySpark for this.

Answer 1

Hi you can use a simple join to do this.

result= dataframe1.join(dataframe2,[ dataframe2.integer_start <= dataframe1.integer  , dataframe2.integer_end >= dataframe1.integer ], how='inner').select("integer","animal")

This will give you exactly what you need.

Depending on whether you want to include the edge cases you can remove the = in <= and >=.

How to Join Pyspark Dataframe that is In Between 2 Columns of another Dataframe?

Question

1 answers

solution1
0 2020-01-06 05:41:15

How to Join Pyspark Dataframe that is In Between 2 Columns of another Dataframe?

Question

1 answers

solution1 0 2020-01-06 05:41:15

solution1
0 2020-01-06 05:41:15