Spark: Read a csv file into a map like structure using scala

Question

I have a csv file of the format:

key, age, marks, feature_n
abc, 23, 84, 85.3
xyz, 25, 67, 70.2

Here the number of features can vary. In eg: I have 3 features (age, marks and feature_n). I have to convert it into a Map[String,String] as below :

[key,value]
["abc","age:23,marks:84,feature_n:85.3"]
["xyz","age:25,marks:67,feature_n:70.2"]

I have to join the above data with another dataset A on column 'key' and append the 'value' to another column in dataset A. The csv file can be loaded into a dataframe with schema (schema defined by first row of the csv file).

val newRecords = sparkSession.read.option("header", "true").option("mode", "DROPMALFORMED").csv("/records.csv");

Post this I will join the dataframe newRecords with dataset A and append the 'value' to one of the columns of dataset A.

How can I iterate over each column for each row, excluding the column "key" and generate the string of format "age:23,marks:84,feature_n:85.3" from newRecords?

I can alter the format of csv file and have the data in JSON format if it helps.

I am fairly new to Scala and Spark.

Answer 1

我建议以下解决方案：

val updated:RDD[String]=newRecords.drop(newRecords.col("key")).rdd.map(el=>{val a=el.toSeq;val st= "age"+a.head+"marks:"+a(1)+" feature_n:"+a.tail; st})

Spark: Read a csv file into a map like structure using scala

Question

1 answers

solution1
0 ACCPTED 2017-03-26 20:59:27

Spark: Read a csv file into a map like structure using scala

Question

1 answers

solution1 0 ACCPTED 2017-03-26 20:59:27

solution1
0 ACCPTED 2017-03-26 20:59:27