How to split a single Dataset column to multiple columns in spark. I found something in pyspark and tried to implement the same approach in java,but how can i extend this to n
columns without specifying any schema?
Dataset Loooks like this
data |
+--------------------------------------------------------------------------------------------------------------------------------+
|0311111111111111|00000005067242541501|18275008905683|86.80||DESC\|123|10000003|2|1145 |
|0311111111111111|00000005067242541501|B8426621002A|500.00||DESC\|TRF |10000015|28|1170 |
+--------------------------------------------------------------------------------------------------------------------------------+
Columns:
id, tid, mid, amount, mname, desc, brand, brandId, mcc
**The desc column can contain |
which is also field dilimiter.In the case where fields is having '|'can we wrap the field in double quotes?
According to the java you have to make sure your dataset string value (variable str ) should like this because. In java Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ ) You can try like this way.
String str = "0311111111111111|00000005067242541501|18275008905683|86.80||DESC\\|123|10000003|2|1145";
String str1 = str.substring(0, str.indexOf("\\"));
String str2 = str.substring(str.indexOf("\\"));
String [] splite1 = str1.split("\\|");
String [] splite2 = str2.split("\\|");
//for DESC column
String desc = splite1[splite1.length -1 ] + splite2[0] + "|" + splite2[1] ;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.