[英]pyspark join with 2 lookup tables
I have one sales data and product details in two lookup table我在两个查找表中有一个销售数据和产品详细信息
df_prod_lookup1 df_prod_lookup1
ID product description
1 cereal Minipack
2 canola bottle
4 rice bag
df_prod_lookup2 df_prod_lookup2
ID product description
6 glass bottle
8 plants hibiscus
10 tree banyan
sales_df sales_df
ID product
10 tree
1 cereal
4 rice
8 plants
Expected output:预计 output:
ID product description
10 tree banyan
1 cereal Minipack
4 rice bag
8 plants hibiscus
I am supposed to use lookup table 1 and later lookup table 2 if ID is not available in lookup table 1如果 ID 在查找表 1 中不可用,我应该使用查找表 1 和后来的查找表 2
lookup table 1 and 2 are of different column names and can not be merged as one.查找表 1 和 2 的列名不同,不能合并为一个。 Is tehre a way to check if ID is available in lookuptable 1 and do the join if not then lookup table 2 for every record in the sales?是否有一种方法可以检查 ID 在查找表 1 中是否可用,如果没有则进行连接,然后为销售中的每条记录查找表 2? Thanks.谢谢。
I could do only simple join with one lookup table.我只能用一个查找表进行简单的连接。
df_final = sales_df.join(df_prod_lookup1 on=['ID'], how='left')
Regards问候
Left join first with lookup table 1, and then with lookup table 2.先左连接查找表 1,然后左连接查找表 2。
The coalesce
function allows you to merge the description
fields. coalesce
function 允许您合并description
字段。
df_prod_lookup1 = df_prod_lookup1.withColumnRenamed("product", "product1").withColumnRenamed("description", "description1")
df_prod_lookup2 = df_prod_lookup2.withColumnRenamed("product", "product2").withColumnRenamed("description", "description2")
from pyspark.sql.functions import coalesce
# Edit based on comments #
sales_df.join(df_prod_lookup1, on=['ID'], how='left')\
.join(df_prod_lookup2, on=['ID'], how='left')\
.withColumn('product', coalesce('product1', 'product2'))\
.withColumn('description', coalesce('description1', 'description2'))\
.drop('product1', 'product2', 'description1', 'description2').show()
+---+-------+-----------+
| ID|product|description|
+---+-------+-----------+
| 8| plants| hibiscus|
| 1| cereal| Minipack|
| 10| tree| banyan|
| 4| rice| bag|
+---+-------+-----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.