简体   繁体   中英

How to Join Pyspark Dataframe that is In Between 2 Columns of another Dataframe?

I have 2 dataframes, one of them consist of 1 column of integers and the 2nd dataframe consist of 3 columns (integer_start, integer_end, animal).

dataframes and their columns

dataframe1 -> integer

dataframe2 -> integer_start, integer_end, animal

So what i want to do is to join these 2 dataframes such that if

dataframe1.integer is in between dataframe2.integer_start and dataframe2.integer_end

take out dataframe1.integer and the respective dataframe2.animal and put into a new dataframe called dataframe3

Hope you can help me with this. I am using PySpark for this.

Hi you can use a simple join to do this.

result= dataframe1.join(dataframe2,[ dataframe2.integer_start <= dataframe1.integer  , dataframe2.integer_end >= dataframe1.integer ], how='inner').select("integer","animal")

This will give you exactly what you need.

Depending on whether you want to include the edge cases you can remove the = in <= and >=.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM