简体   繁体   中英

pyspark joining more than 2 dataframes

Suppose i have 100 of data frames and how can i combine it into single one having all the columns. my data frames is looking like,

id  name  marks
00  abc   70
01  def   67
02  ghi   68
03  jkl    90


id  name  class
00  abc A
01  def    B
02  ghi B
03  jkl    A


id  name  std
00  abc    1
01  def    2
02  ghi    3
03  jkl    4

id  name  city
00  abc    mex
01  def    nyc
02  ghi    ind
03  jkl    aus

So i have more than 50 data frames so last columns is changing each time.

So my question is how can i make single resultant dataframe which will look like below,

 id  name  marks  class  std  city
 00  abc    70      A     1    mex
 01  def     67     B     2    nyc
 02  ghi     68     B     3    la
 03  jkl     90     A     4    aus

你可以使用嵌套的spark SQL查询加入其中几个,但是加入其中五十个需要花费大量的时间。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM