I am creating a piece of java code to read and interpret a tsv file. I would like to find a regular expression that is able to split the lines within the file knowing:
""
) Sample input lines:
"aaa" 123 "bbb" "cc" "ddd" "aaa" 123 "bbb" "cc" " 6" "ddd" 456 "eee" "ff" " "" " "ddd" 456 "eee" "ff" " "" aaa "" "
* (please note: tabs in last three string)
My current regex is ("[^"]*"*|[^\\t]+)+
, but that fails on the last example (makes smaller substring)
Lets settle the case:
\\t(?=(?:\\[^\\"\\]*\\"\\[^\\"\\]*\\")*\\[^\\"\\]*$)
(click on the link to get a description of the pattern)
Sample code: ideone demo
import java.util.regex.Pattern;
public class example {
public static void main(String[] asd){
String sourcestring = "\"aaa\" 123 \"bbb\" \"cc\" \"ddd\"\n"
+ "\"aaa\" 123 \"bbb\" \"cc\" \" 6\"\n"
+ "\"ddd\" 456 \"eee\" \"ff\" \" \"\" \"\n"
+ "\"ddd\" 456 \"eee\" \"ff\" \" \"\" aaa \"\" \"";
Pattern reLines = Pattern.compile("\\n");
Pattern reTsv = Pattern.compile("\\t(?=(?:[^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)");
String[] lines = reLines.split(sourcestring);
for(int linesIdx = 0; linesIdx < lines.length; linesIdx++ ) {
String[] parts = reTsv.split(lines[linesIdx]);
for(int partsIdx = 0; partsIdx < parts.length; partsIdx++ ) {
System.out.println( "[" + partsIdx + "] = " + parts[partsIdx]);
}
}
}
}
Output:
[0] = "aaa"
[1] = 123
[2] = "bbb"
[3] = "cc"
[4] = "ddd"
[0] = "aaa"
[1] = 123
[2] = "bbb"
[3] = "cc"
[4] = " 6"
[0] = "ddd"
[1] = 456
[2] = "eee"
[3] = "ff"
[4] = " "" "
[0] = "ddd"
[1] = 456
[2] = "eee"
[3] = "ff"
[4] = " "" aaa "" "
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.