简体繁体中英

How to join dataframes (from a collection of Datasets)?

原文 2016-11-15 07:08:31 5 2 scala/ apache-spark/ apache-spark-sql

I was searching and figuring out the best way to join n Spark dataframes.

Example List(df1,df2,df3,dfN) where all df have a date where I can join on.

Recursion ?

2 answers

像这样：

List(df1,df2,df3,dfN).reduce((a, b) => a.join(b, joinCondition))

I am writing the same answer as above for pyspark users.

from functools import reduce
from pyspark.sql.functions import coalesce
dfslist #list of all dataframes that you want to join
mergedDf = reduce(lambda df1,df2 : df1.join(df2, [df1.joinKey == df2.joinKey ], "outer").select("*", coalesce(df1.joinKey, df2.joinKey).alias("joinKey")).drop(df1.joinKey ).drop(df2.joinKey ), dfslist )

Outer join two Datasets (not DataFrames) in Spark Structured Streaming

Spark Join Single Dataframe to a Collection of Dataframes

How to join two dataframes?

How to join two datasets in scala?

How to join Datasets on multiple columns?

How to join items from different Dataframes to one common DataFrame

How to join two dataframes in Scala and select on few columns from the dataframes by their index?

How to join and reduce two datasets with arrays?

how to join two datasets by key in scala spark

How to join datasets with same columns and select one?

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Outer join two Datasets (not DataFrames) in Spark Structured Streaming Spark Join Single Dataframe to a Collection of Dataframes How to join two dataframes? How to join two datasets in scala? How to join Datasets on multiple columns? How to join items from different Dataframes to one common DataFrame How to join two dataframes in Scala and select on few columns from the dataframes by their index? How to join and reduce two datasets with arrays? how to join two datasets by key in scala spark How to join datasets with same columns and select one?

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM