簡體   English   中英

如何連接兩個 Pyspark 數據幀的不同元素

[英]How to join between different elements of two Pyspark dataframes

我有兩個名為df1和df2的數據框,數據dataframe的內容如下。

df1:

line_item_usage_account_id  line_item_unblended_cost    name 
100000000001                12.05                       account1
200000000001                52                          account2
300000000003                12.03                       account3

df2:

accountname     accountproviderid   clustername     app_pmo     app_costcenter      line_item_unblended_cost
account1        100000000001        cluster1        111111      11111111            12.05
account2        200000000001        cluster2        222222      22222222            52

我需要將不在 df2.accountproviderid 中的 df1.line_item_usage_account_id 的 ID 添加到聯接中,如下所示:

accountname     accountproviderid   clustername     app_pmo     app_costcenter      line_item_unblended_cost
account1        100000000001        cluster1        111111      11111111            12.05
account2        200000000001        cluster2        222222      22222222            52
account3        300000000003        NA              NA          NA                  12.03

df2.accountproviderid 中找不到來自 df1.line_item_usage_account_id 的 id "300000000003",因此它被添加到新的 dataframe 中。

知道如何實現這一目標嗎? 我很感激任何幫助。

您可以在此處使用right join

df2.join(df1, (df2.accountproviderid == df1.line_item_usage_account_id), "right")\
    .drop("accountname", "accountproviderid")\
    .withColumnRenamed("line_item_usage_account_id", "accountproviderid")\
    .withColumnRenamed("name", "accountname")\
    .select("accountname", "accountproviderid", "clustername", "app_pmo",\
     "app_costcenter", "line_item_unblended_cost").show()

+-----------+-----------------+-----------+-------+--------------+------------------------+
|accountname|accountproviderid|clustername|app_pmo|app_costcenter|line_item_unblended_cost|
+-----------+-----------------+-----------+-------+--------------+------------------------+
|   account1|     100000000001|   cluster1| 111111|      11111111|                   12.05|
|   account2|     200000000001|   cluster2| 222222|      22222222|                    52.0|
|   account3|     300000000003|       null|   null|          null|                   12.03|
+-----------+-----------------+-----------+-------+--------------+------------------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM