[英]Pyspark - Subtract columns from two different dataframes
I have two dataframes below:我在下面有两个数据框:
df1: df2:
+------------+------------+-----------+ +-----------+-------------+-----------+
| date |Advertiser |Impressions| | date |Advertiser |Impressions|
+------------+------------+-----------+ +-----------+-------------+-----------+
|2020-01-08 |b |50035 | | 2020-01-07|b |10000 |
|2020-01-08 |c |70000 | | 2020-01-07|c |25260 |
+------------+------------+-----------+ +-----------+-------------+-----------+
I would like to do df1(Impressions) - df2(Impressions), and save it to a new dataframe df3.我想做 df1(Impressions) - df2(Impressions),并将其保存到新的 dataframe df3。
+------------+------------+----------------+
| date |Advertiser |diff Impressions|
+------------+------------+----------------+
|2020-01-08 |b |40035 |
|2020-01-08 |c |44740 |
+------------+------------+----------------+
You can join the two dataframes using the Advertiser column and make appropriate selections:您可以使用 Advertiser 列连接两个数据框并做出适当的选择:
df3 = df1.join(df2, 'Advertiser').select(
df1.date,
'Advertiser',
(df1.Impressions - df2.Impressions).alias('diff Impressions')
)
df3.show()
+----------+----------+----------------+
| date|Advertiser|diff Impressions|
+----------+----------+----------------+
|2020-01-08| b| 40035|
|2020-01-08| c| 44740|
+----------+----------+----------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.