简体   繁体   中英

Spark - Scala - Join RDDS (csv) files

I'm coming in and learning scala, as I am in the initial steps, appeared a demand and need to know how to join in two fields like a relational database.

Example:

Table 1 ( csv )

zip_type, primary_city, acceptable_cities, unacceptable_cities

Example:

Table 2 ( csv )

GEO.id, GEO.id2, GEO.display-label, VD01

Question:

I want to join Column1 (zip type)Table1 with Column2(GEO.id2)Table2.

Currently I:

  • Created an RDD with my CSV file
  • Processed each line using the CSV parser but I have a little trouble to making the join.

What do I need to do next?

To make join you need pair-rdds with same key column. Consider transforming RDD-1 into RDD of tuple (K, V) with zip-type as key, similarly RDD-2 with GEO.id2 as key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM