簡體   English   中英

pyspark:根據其他記錄獲取列

[英]pyspark:Get columns based on other records

我有一個看起來像這樣的數據框

membershipAccountNbr            cntryRetailChannelCustId
111590058               1010015900581000010101
214100897               1010041008972100010101
104100897               1010041008971000010101

另一個看起來像這樣:

membershipAccountNbr    parentMembershipNbr
111590058                   111590058
214100897                   104100897

我的目標是使其看起來像:

membershipAccountNbr parentMembershipNbr parentCustId
111590058               111590058    1010015900581000010101
214100897               104100897    1010041008971000010101

我嘗試使用聯接,但它們給出了歧義錯誤。 我是Pyspark的新手,請幫助。

假設df1是,

+--------------------+------------------------+
|membershipAccountNbr|cntryRetailChannelCustId|
+--------------------+------------------------+
|           111590058|    10100159005810000...|
|           214100897|    10100410089721000...|
|           104100897|    10100410089710000...|
+--------------------+------------------------+

df2

+--------------------+-------------------+
|membershipAccountNbr|parentMembershipNbr|
+--------------------+-------------------+
|           111590058|          111590058|
|           214100897|          104100897|
+--------------------+-------------------+

然后你跑

df1.join(df2, on="membershipAccountNbr", how="right").select(
    col("membershipAccountNbr"),
    col("parentMembershipNbr"),
    col("cntryRetailChannelCustId").alias("parentCustId"),
).show()

結果看起來像這樣,

+--------------------+-------------------+--------------------+
|membershipAccountNbr|parentMembershipNbr|        parentCustId|
+--------------------+-------------------+--------------------+
|           111590058|          111590058|10100159005810000...|
|           214100897|          104100897|10100410089721000...|
+--------------------+-------------------+--------------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM