简体   繁体   中英

How to split a single Dataset column to multiple columns

How to split a single Dataset column to multiple columns in spark. I found something in pyspark and tried to implement the same approach in java,but how can i extend this to n columns without specifying any schema?

Dataset Loooks like this

    data                                                                                                                            |
    +--------------------------------------------------------------------------------------------------------------------------------+
    |0311111111111111|00000005067242541501|18275008905683|86.80||DESC\|123|10000003|2|1145                                           |
    |0311111111111111|00000005067242541501|B8426621002A|500.00||DESC\|TRF |10000015|28|1170                                          |
    +--------------------------------------------------------------------------------------------------------------------------------+

    Columns:

    id, tid, mid, amount, mname, desc, brand, brandId, mcc

**The desc column can contain | which is also field dilimiter.In the case where fields is having '|'can we wrap the field in double quotes?

According to the java you have to make sure your dataset string value (variable str ) should like this because. In java Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ ) You can try like this way.

    String str =   "0311111111111111|00000005067242541501|18275008905683|86.80||DESC\\|123|10000003|2|1145";
    String str1 = str.substring(0, str.indexOf("\\"));
    String str2 = str.substring(str.indexOf("\\"));
    String [] splite1 = str1.split("\\|");
    String [] splite2 = str2.split("\\|");
    //for DESC column 
    String desc = splite1[splite1.length -1 ] + splite2[0] + "|" + splite2[1] ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM