如何加入位于另一个数据框的两列之间的 Pyspark 数据框？

Question

I have 2 dataframes, one of them consist of 1 column of integers and the 2nd dataframe consist of 3 columns (integer_start, integer_end, animal).我有 2 个数据帧，其中一个由 1 列整数组成，第二个数据帧由 3 列（integer_start、integer_end、animal）组成。

dataframes and their columns数据框及其列

dataframe1 -> integer

dataframe2 -> integer_start, integer_end, animal

So what i want to do is to join these 2 dataframes such that if所以我想要做的是加入这两个数据帧，如果

dataframe1.integer is in between dataframe2.integer_start and dataframe2.integer_end

take out dataframe1.integer and the respective dataframe2.animal and put into a new dataframe called dataframe3取出 dataframe1.integer 和相应的 dataframe2.animal 并放入一个名为 dataframe3 的新数据帧中

Hope you can help me with this.希望你能帮我解决这个问题。 I am using PySpark for this.我为此使用 PySpark。

Answer 1

Hi you can use a simple join to do this.您好，您可以使用简单的连接来执行此操作。

result= dataframe1.join(dataframe2,[ dataframe2.integer_start <= dataframe1.integer  , dataframe2.integer_end >= dataframe1.integer ], how='inner').select("integer","animal")

This will give you exactly what you need.这将给你你所需要的。

Depending on whether you want to include the edge cases you can remove the = in <= and >=.根据您是否要包括边缘情况，您可以删除 <= 和 >= 中的 =。

如何加入位于另一个数据框的两列之间的 Pyspark 数据框？

问题描述

1 个解决方案

解决方案1
0 2020-01-06 05:41:15

如何加入位于另一个数据框的两列之间的 Pyspark 数据框？

问题描述

1 个解决方案

解决方案1 0 2020-01-06 05:41:15

解决方案1
0 2020-01-06 05:41:15