[英]Merge two different dataframes in pyspark
I have two different dataframes, one is date combinations, and one is city pairs:我有两个不同的数据框,一个是日期组合,一个是城市对:
df_date_combinations: df_date_combinations:
+-------------------+-------------------+
| fs_date| ss_date|
+-------------------+-------------------+
|2022-06-01T00:00:00|2022-06-02T00:00:00|
|2022-06-01T00:00:00|2022-06-03T00:00:00|
|2022-06-01T00:00:00|2022-06-04T00:00:00|
+-------------------+-------------------+
city pairs:城市对:
+---------+--------------+---------+--------------+
|fs_origin|fs_destination|ss_origin|ss_destination|
+---------+--------------+---------+--------------+
| TLV| NYC| NYC| TLV|
| TLV| ROM| ROM| TLV|
| TLV| BER| BER| TLV|
+---------+--------------+---------+--------------+
I want to combine them so I will have the following dataframe:我想将它们组合起来,所以我将拥有以下数据框:
+----------+----------+---------+--------------+---------+--------------+
| fs_date| ss_date|fs_origin|fs_destination|ss_origin|ss_destination|
+----------+----------+---------+--------------+---------+--------------+
|2022-06-01|2022-06-02| TLV| NYC| NYC| TLV|
|2022-06-01|2022-06-03| TLV| NYC| NYC| TLV|
|2022-06-01|2022-06-04| TLV| NYC| NYC| TLV|
|2022-06-01|2022-06-02| TLV| ROM| ROM| TLV|
|2022-06-01|2022-06-03| TLV| ROM| ROM| TLV|
|2022-06-01|2022-06-04| TLV| ROM| ROM| TLV|
|2022-06-01|2022-06-02| TLV| BER| BER| TLV|
|2022-06-01|2022-06-03| TLV| BER| BER| TLV|
|2022-06-01|2022-06-04| TLV| BER| BER| TLV|
+----------+----------+---------+--------------+---------+--------------+
Thanks!谢谢!
听起来像一个交叉连接。
df1.crossJoin(df2)
Pandas actually has built-in methods to do this, we use concat
to concatenate the dataframes. Pandas 实际上有内置的方法来做到这一点,我们使用concat
来连接数据帧。 You can read how to do this here:您可以在此处阅读如何执行此操作:
The part that is pertinent to you would be:与您相关的部分是:
pd.concat([df_date_combinations, city_pairs], axis = 1)
Hope this helps!希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.