I have a csv file of the format:
key, age, marks, feature_n
abc, 23, 84, 85.3
xyz, 25, 67, 70.2
Here the number of features can vary. In eg: I have 3 features (age, marks and feature_n). I have to convert it into a Map[String,String] as below :
[key,value]
["abc","age:23,marks:84,feature_n:85.3"]
["xyz","age:25,marks:67,feature_n:70.2"]
I have to join the above data with another dataset A on column 'key' and append the 'value' to another column in dataset A. The csv file can be loaded into a dataframe with schema (schema defined by first row of the csv file).
val newRecords = sparkSession.read.option("header", "true").option("mode", "DROPMALFORMED").csv("/records.csv");
Post this I will join the dataframe newRecords with dataset A and append the 'value' to one of the columns of dataset A.
How can I iterate over each column for each row, excluding the column "key" and generate the string of format "age:23,marks:84,feature_n:85.3" from newRecords?
I can alter the format of csv file and have the data in JSON format if it helps.
I am fairly new to Scala and Spark.
我建议以下解决方案:
val updated:RDD[String]=newRecords.drop(newRecords.col("key")).rdd.map(el=>{val a=el.toSeq;val st= "age"+a.head+"marks:"+a(1)+" feature_n:"+a.tail; st})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.