简体   繁体   English

Pyspark - 从两个不同的数据帧中减去列

[英]Pyspark - Subtract columns from two different dataframes

I have two dataframes below:我在下面有两个数据框:

df1:                                             df2:
  +------------+------------+-----------+        +-----------+-------------+-----------+    
  |  date      |Advertiser  |Impressions|        |   date    |Advertiser   |Impressions|
  +------------+------------+-----------+        +-----------+-------------+-----------+
  |2020-01-08  |b           |50035      |        | 2020-01-07|b            |10000      |
  |2020-01-08  |c           |70000      |        | 2020-01-07|c            |25260      |
  +------------+------------+-----------+        +-----------+-------------+-----------+ 
  

I would like to do df1(Impressions) - df2(Impressions), and save it to a new dataframe df3.我想做 df1(Impressions) - df2(Impressions),并将其保存到新的 dataframe df3。

  +------------+------------+----------------+               
  |  date      |Advertiser  |diff Impressions|       
  +------------+------------+----------------+        
  |2020-01-08  |b           |40035           |        
  |2020-01-08  |c           |44740           |          
  +------------+------------+----------------+

You can join the two dataframes using the Advertiser column and make appropriate selections:您可以使用 Advertiser 列连接两个数据框并做出适当的选择:

df3 = df1.join(df2, 'Advertiser').select(
    df1.date, 
    'Advertiser', 
    (df1.Impressions - df2.Impressions).alias('diff Impressions')
)

df3.show()
+----------+----------+----------------+
|      date|Advertiser|diff Impressions|
+----------+----------+----------------+
|2020-01-08|         b|           40035|
|2020-01-08|         c|           44740|
+----------+----------+----------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM