I have 2 dataframes, one of them consist of 1 column of integers and the 2nd dataframe consist of 3 columns (integer_start, integer_end, animal).
dataframes and their columns
dataframe1 -> integer
dataframe2 -> integer_start, integer_end, animal
So what i want to do is to join these 2 dataframes such that if
dataframe1.integer is in between dataframe2.integer_start and dataframe2.integer_end
take out dataframe1.integer and the respective dataframe2.animal and put into a new dataframe called dataframe3
Hope you can help me with this. I am using PySpark for this.
Hi you can use a simple join to do this.
result= dataframe1.join(dataframe2,[ dataframe2.integer_start <= dataframe1.integer , dataframe2.integer_end >= dataframe1.integer ], how='inner').select("integer","animal")
This will give you exactly what you need.
Depending on whether you want to include the edge cases you can remove the = in <= and >=.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.