There are two dataframes
Data1 = [("1",None,"a","Kelvin"), \
("2","1","b","Ho"), \
("2","2","b","Ho"), \
("7","1","c","Shuai"), \
]
col1= ["ID","s_name","group","name"]
tableA = spark.createDataFrame(data = Data1, schema = col1)
Data2 = [("1","1","bird"), \
("2",None,"tiger"), \
]
col2= ["ID","s_name","classes"]
tableB = spark.createDataFrame(data = Data2, schema = col2)
When they are joined together, tableA.join(tableB,["ID"],"left")
the columns of the new dataframe are: ['ID', 's_name', 'group', 'name', 's_name', 'classes']
Since tableA contains tableB, the key column in tableB is redundant and ambiguous if we want to coalesce joined dataframe later so I can use
tableB = tableB.withColumnRenamed("s_name","s_name_2")
then after join, I apply drop
val = "s_name"
tableA.join(tableB,["ID"],"left").withColumn(val,coalesce(col(val),col(val+"_2"))).drop(val+"_2")
The problem with this approach is if I want to use generic variable to handle all ambiguous name cases: for instance the column headers with same name is stored in a list called ambiguous_name
then I do
for val in ambiguous_name:
tableB = tableB.withColumnRenamed(val,val+"_2")
then drop the ambiguous column using
joined_table = tableA.join(tableB,["ID"],"left").drop("key_2")
for val in ambiguous_name:
joined_table = joined_table.drop(val+"_2")
But, tableB looks something like this:
Data2 = [("1","1","test","bird"), \
("2",None,"test2","tiger"), \
]
col2= ["ID","s_name","s_name_2","classes"]
tableB = spark.createDataFrame(data = Data2, schema = col2)
Of course I can call it tableB = tableB.withColumnRenamed(val,val+"_3")
but what happen if col2= ["ID","s_name","s_name_3","classes"]
?
Is there a generic postfix naming to resolve this?
If I got you right, the issue you have can be resolved in away using alias.
new = tableA.alias('tableA').join(tableB.alias('tableB'),["ID"],"left")
new = new.withColumnRenamed('tableB.key','tableB.key'+"_3").show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.