[英]how to join two DataFrame and replace one column conditionally in spark
[英]How can I use Spark join operations to combine two dataframe into getting a new one?
这是我的两个输入 PySpark DataFrames
数据框1
li = [('abc', 'xyz')]
liColumns = ["aid", "bid"]
tempDF = spark.createDataFrame(data=li, schema = liColumns)
tempDF.printSchema()
tempDF.show(truncate=False)
+---+---+
|aid|bid|
+---+---+
|abc|xyz|
+---+---+
数据框2
other_li = [('abc', '111', 'desc111'), ('abc', '112', 'desc112'), ('xyz', 'A123', 'city'), ('xyz', 'A456', 'state'), ('xyz', 'A789', 'zip')]
otherColumns = ['real_aid', 'code', 'some_value']
otherDF = spark.createDataFrame(data=other_li, schema = otherColumns)
otherDF.printSchema()
otherDF.show(truncate=False)
+--------+----+----------+
|real_aid|code|some_value|
+--------+----+----------+
|abc |111 |desc111 |
|abc |112 |desc112 |
|xyz |A123|city |
|xyz |A456|state |
|xyz |A789|zip |
+--------+----+----------+
问题:如何将两者结合起来以获得第三个 DataFrame。我了解如何使用附加/联合来完成此操作,但有没有办法使用连接来完成此操作? 或者有一种方法可以更有效地做到这一点? 我需要在两张大桌子上做这件事。
预计DataFrame
output_li = [('abc', '111', 'desc111'), ('abc', '112', 'desc112'), ('abc', 'A123', 'city'), ('abc', 'A456', 'state'), ('abc', 'A789', 'zip'), ('xyz', 'A123', 'city'), ('xyz', 'A456', 'state'), ('xyz', 'A789', 'zip')]
otherColumns = ['real_aid', 'code', 'some_value']
otherDF = spark.createDataFrame(data=output_li, schema = otherColumns)
otherDF.printSchema()
otherDF.show(truncate=False)
+--------+----+----------+
|real_aid|code|some_value|
+--------+----+----------+
|abc |111 |desc111 |
|abc |112 |desc112 |
|abc |A123|city |
|abc |A456|state |
|abc |A789|zip |
|xyz |A123|city |
|xyz |A456|state |
|xyz |A789|zip |
+--------+----+----------+
据我了解,您想根据real_aid
和bid
列加入两个 dataframe。 然后,如果aid
不等于real_aid
,您想要“扩展”该行。
你可以这样做:
tempDF\
.withColumnRenamed("bid", "real_aid")\
.join(otherDF, ['real_aid'], "right")\
.withColumn("real_aid", F.explode(F.array("real_aid", "aid")))\
.drop("aid")\
.filter(F.col("real_aid").isNotNull())\
.show()
+--------+----+----------+
|real_aid|code|some_value|
+--------+----+----------+
| abc| 111| desc111|
| abc| 112| desc112|
| xyz|A123| city|
| abc|A123| city|
| xyz|A456| state|
| abc|A456| state|
| xyz|A789| zip|
| abc|A789| zip|
+--------+----+----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.