简体   繁体   English

Pyspark 连接两个数据框

[英]Pyspark join two dataframes

Assuming I have two dataframes with different levels of information like this:假设我有两个具有不同级别信息的数据框,如下所示:

df1
  Month       Day    Values
   Jan      Monday     65      
   Feb      Monday     66
   Mar      Tuesday    68
   Jun      Monday     58 
    

df2
  Month       Day     Hour
   Jan      Monday     5    
   Jan      Monday     5       
   Jan      Monday     8
   Feb      Monday     9
   Feb      Monday     9
   Feb      Monday     9
   Mar      Tuesday    10
   Mar      Tuesday    1
   Jun      Tuesday    2                 
   Jun      Monday     7             
   Jun      Monday     8        

I want to join df1 with df2 and transfer the 'Value' information to df2: Each hour of day will get the 'Day' value.我想加入 df1 和 df2 并将“值”信息传输到 df2:一天中的每个小时都将获得“天”值。

Expected output:预期输出:

   final
      Month       Day     Hour     Value
       Jan      Monday     5         65
       Jan      Monday     5         65
       Jan      Monday     8         65
       Feb      Monday     9         66
       Feb      Monday     9         66
       Feb      Monday     9         66
       Mar      Tuesday    10        68
       Mar      Tuesday    1         68
       Jun      Monday     7         58             
       Jun      Monday     8         58

This should be a simple join:这应该是一个简单的连接:

df2 = df2.join(df1, on=['Month', 'Day'], how='inner')

The join will calculate all possible combinations.联接将计算所有可能的组合。 Eg,例如,

df1:
   Jan      Monday     65

df2: 
  Month       Day     Hour
   Jan      Monday     5    
   Jan      Monday     5  

Because all entries match on Jan and Monday all possible combinations will be part of the output:因为所有条目都在JanMonday匹配,所以所有可能的组合都将成为输出的一部分:

      Month       Day     Hour     Value
       Jan      Monday     5         65
       Jan      Monday     5         65

Note: Whether you join df1 onto df2 or vice versa and whether you use an inner or left join depends on how you want to handle mismatches.注意:您是否将df1连接到df2或反之亦然,以及使用inner还是left连接取决于您希望如何处理不匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM