简体   繁体   中英

How do I use joinWith to join more than 2 datasets?

I want to achieve something like this:

x.joinWith(y, x(id) === y(fid), "left_outer")
  .joinWith(z, x(id) === z(fid))
  .map(case {(x, y, z) => combineXYZ(x, y, z)})

When you use joinWith , What you get is a new Dataset of Tuple2 : (x, y) . So the column names are _1 and _2 .

So when you do your second join, you need to reference a column name from the tuple, not from one of the source dataset. Like that :

x.joinWith(y, x(id) === y(fid), "left_outer").joinWith(z, $"_1.id" === z(fid))

Now, what you get is a tuple2 where first element is also a tuple : ((x, y), z) . So you must do your map like :

.map(case {((x, y), z) => combineXYZ(x, y, z)})

This should work. Note that If you don't want to use $"_1.id , which is totally understandable, you can do a map after your first join, in order to create a new object, other than a tuple2, in order to get the correct column name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM