简体   繁体   中英

Convert tab separated string to comma separated string using Java

I have tab separated file as string. In the below example I have 2 lines. In the first line values are split by both tab and newline characters. The second line does not contain newline characters.

I have the header as well in the original data. I want to read the string data both header and values and convert them to a CSV string. When I read this data line by line using CSVParser , it's not providing the proper value, because some columns are split with \n (newline) as well.

However, each "row" is terminated by the same string, ie

"test2222"

1st row

"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "Highlight aluminium personnel lanyarded into the Haulotte boom lift with a spotter. All tools observed to be lanyarded including protection gear. 
Blue glue asset card observed to be attached to the machinery, 10 year inspection of plant not required due to it being only 3yrs old. Last annual inspection august 2022 and logbook was subsequently observed. 
Plant registration was all observed and the weight loads were all abided by."   "test2222"

2nd row

"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "1" "0" "Level 79"  "16/01/23 11:12:50 pm"  "Logistics - Construction Personnel & Material Lifts"                   "Schindler lift cages were observed to be free of any loose debris or material that may pose a risk of falling into the lift shaft below. L80 and L79 were observed to be compliant on both sides of the shaft."    "test2222"

Data as String

String test = "\"abc\"\t\"cde\"\t\"fhg\"\t\"ijk\"\t\"17/01/23 10:09:50 am\"\t\"test111\"\t\"test2\"\t\"Individual\"\t\"Enclosure of Work Areas\"\t\t\"Highlight aluminium personnel lanyarded into the Haulotte boom lift with a spotter. All tools observed to be lanyarded including protection gear. \n" +
            "Blue glue asset card observed to be attached to the machinery, 10 year inspection of plant not required due to it being only 3yrs old. Last annual inspection august 2022 and logbook was subsequently observed. \n" +
            "Plant registration was all observed and the weight loads were all abided by.\"\t\"test2222\"\n" +
            "\"abc\"\t\"cde\"\t\"fhg\"\t\"ijk\"\t\"17/01/23 10:09:50 am\"\t\"test111\"\t\"test2\"\t\"Individual\"\t\"Enclosure of Work Areas\"\t\t\"1\"\t\"0\"\t\"Level 79\"\t\"16/01/23 11:12:50 pm\"\t\"Logistics - Construction Personnel & Material Lifts\"\t\t\t\t\t\"Schindler lift cages were observed to be free of any loose debris or material that may pose a risk of falling into the lift shaft below. L80 and L79 were observed to be compliant on both sides of the shaft.\"\t\"test2222\"";

Can anyone please help me to resolve this. Thanks in advance!!

You can use a Scanner to read the file. Set the delimiter to the string that terminates a "row" in your file. Below code demonstrates.

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Scanner;

public class Main {

    public static void main(String[] args) {
        Path source = Paths.get("sampldat.txt");
        try (Scanner reader = new Scanner(source)) {
            reader.useDelimiter("\"test2222\"");
            int counter = 0;
            while (reader.hasNext()) {
                String line = reader.next();
                String[] fields = line.split("\\s+");
                System.out.println("Row: " + ++counter);
                System.out.println(String.join(",", fields));
            }
        }
        catch (IOException xIo) {
            xIo.printStackTrace();
        }
    }
}

The contents of file sampldat.txt are the two sample "row"s from your question, ie

"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "Highlight aluminium personnel lanyarded into the Haulotte boom lift with a spotter. All tools observed to be lanyarded including protection gear. 
Blue glue asset card observed to be attached to the machinery, 10 year inspection of plant not required due to it being only 3yrs old. Last annual inspection august 2022 and logbook was subsequently observed. 
Plant registration was all observed and the weight loads were all abided by."   "test2222"
"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "1" "0" "Level 79"  "16/01/23 11:12:50 pm"  "Logistics - Construction Personnel & Material Lifts"                   "Schindler lift cages were observed to be free of any loose debris or material that may pose a risk of falling into the lift shaft below. L80 and L79 were observed to be compliant on both sides of the shaft."    "test2222"

I use try-with-resources to make sure that the file is closed.

Method next will read up to the next occurrence of the delimiter, ie the string "test2222" .

I then split the value returned by method next by [any] whitespace, ie spaces, tabs, newlines, etc.

I then call [static] method join (of class java.lang.String ) to create a comma-separated list as you requested.

I added to the printout a header indicating what "row" is printed.

This is the output I get:

Row: 1
"abc","cde","fhg","ijk","17/01/23,10:09:50,am","test111","test2","Individual","Enclosure,of,Work,Areas","Highlight,aluminium,personnel,lanyarded,into,the,Haulotte,boom,lift,with,a,spotter.,All,tools,observed,to,be,lanyarded,including,protection,gear.,Blue,glue,asset,card,observed,to,be,attached,to,the,machinery,,10,year,inspection,of,plant,not,required,due,to,it,being,only,3yrs,old.,Last,annual,inspection,august,2022,and,logbook,was,subsequently,observed.,Plant,registration,was,all,observed,and,the,weight,loads,were,all,abided,by."
Row: 2
,"abc","cde","fhg","ijk","17/01/23,10:09:50,am","test111","test2","Individual","Enclosure,of,Work,Areas","1","0","Level,79","16/01/23,11:12:50,pm","Logistics,-,Construction,Personnel,&,Material,Lifts","Schindler,lift,cages,were,observed,to,be,free,of,any,loose,debris,or,material,that,may,pose,a,risk,of,falling,into,the,lift,shaft,below.,L80,and,L79,were,observed,to,be,compliant,on,both,sides,of,the,shaft."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM