I have two csv files (datasets) file1 and file2.
File1 consists of following columns:
Orders | Requests | Book1 | Book2
Varchar| Integer | Integer| Integer
File2 consists of following columns:
Book3 | Book4 | Book5 | Orders
String| String| Varchar| Varchar
How to combine the data in two CSV files in scala to check:
You can join two csv by making Pair RDD.
val rightFile = job.patch.get.file
val rightFileByKeys = sc.textFile(rightFile).map { line =>
new LineParser(line, job.patch.get.patchKeyIndex, job.delimRegex, Some(job.patch.get.patchValueIndex))
}.keyBy(_.getKey())
val leftFileByKeys = sc.textFile(leftFile).map { line =>
new LineParser(line, job.patch.get.fileKeyIndex, job.delimRegex)
}.keyBy(_.getKey())
leftFileByKeys.join(rightFileByKeys).map { case (key, (left, right)) =>
(job, left.line + job.delim + right.getValue())
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.