简体   繁体   中英

Pyspark join two dataframes

Assuming I have two dataframes with different levels of information like this:

df1
  Month       Day    Values
   Jan      Monday     65      
   Feb      Monday     66
   Mar      Tuesday    68
   Jun      Monday     58 
    

df2
  Month       Day     Hour
   Jan      Monday     5    
   Jan      Monday     5       
   Jan      Monday     8
   Feb      Monday     9
   Feb      Monday     9
   Feb      Monday     9
   Mar      Tuesday    10
   Mar      Tuesday    1
   Jun      Tuesday    2                 
   Jun      Monday     7             
   Jun      Monday     8        

I want to join df1 with df2 and transfer the 'Value' information to df2: Each hour of day will get the 'Day' value.

Expected output:

   final
      Month       Day     Hour     Value
       Jan      Monday     5         65
       Jan      Monday     5         65
       Jan      Monday     8         65
       Feb      Monday     9         66
       Feb      Monday     9         66
       Feb      Monday     9         66
       Mar      Tuesday    10        68
       Mar      Tuesday    1         68
       Jun      Monday     7         58             
       Jun      Monday     8         58

This should be a simple join:

df2 = df2.join(df1, on=['Month', 'Day'], how='inner')

The join will calculate all possible combinations. Eg,

df1:
   Jan      Monday     65

df2: 
  Month       Day     Hour
   Jan      Monday     5    
   Jan      Monday     5  

Because all entries match on Jan and Monday all possible combinations will be part of the output:

      Month       Day     Hour     Value
       Jan      Monday     5         65
       Jan      Monday     5         65

Note: Whether you join df1 onto df2 or vice versa and whether you use an inner or left join depends on how you want to handle mismatches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM