简体   繁体   English

在pyspark中合并两个不同的数据框

[英]Merge two different dataframes in pyspark

I have two different dataframes, one is date combinations, and one is city pairs:我有两个不同的数据框,一个是日期组合,一个是城市对:

df_date_combinations: df_date_combinations:

+-------------------+-------------------+
|            fs_date|            ss_date|
+-------------------+-------------------+
|2022-06-01T00:00:00|2022-06-02T00:00:00|
|2022-06-01T00:00:00|2022-06-03T00:00:00|
|2022-06-01T00:00:00|2022-06-04T00:00:00|
+-------------------+-------------------+

city pairs:城市对:

+---------+--------------+---------+--------------+
|fs_origin|fs_destination|ss_origin|ss_destination|
+---------+--------------+---------+--------------+
|      TLV|           NYC|      NYC|           TLV|
|      TLV|           ROM|      ROM|           TLV|
|      TLV|           BER|      BER|           TLV|
+---------+--------------+---------+--------------+

I want to combine them so I will have the following dataframe:我想将它们组合起来,所以我将拥有以下数据框:

+----------+----------+---------+--------------+---------+--------------+
|   fs_date|   ss_date|fs_origin|fs_destination|ss_origin|ss_destination|
+----------+----------+---------+--------------+---------+--------------+
|2022-06-01|2022-06-02|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-03|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-04|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-02|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-03|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-04|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-02|      TLV|           BER|      BER|           TLV|
|2022-06-01|2022-06-03|      TLV|           BER|      BER|           TLV|
|2022-06-01|2022-06-04|      TLV|           BER|      BER|           TLV|
+----------+----------+---------+--------------+---------+--------------+

Thanks!谢谢!

听起来像一个交叉连接。

df1.crossJoin(df2)

Pandas actually has built-in methods to do this, we use concat to concatenate the dataframes. Pandas 实际上有内置的方法来做到这一点,我们使用concat来连接数据帧。 You can read how to do this here:您可以在此处阅读如何执行此操作:

The part that is pertinent to you would be:与您相关的部分是:

pd.concat([df_date_combinations, city_pairs], axis = 1)

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM