简体   繁体   中英

How NOT to split after a combination of letters and numbers

For a project I need to split the following string

210,'T99, Woody & Paul',1,'Geen omschrijving',5,3,7,'2008-04-12 21:00:00',16

Into this

210
'T99, Woody & Paul'
1
'Geen omschrijving'
5
3
7
'2008-04-12 21:00:00'
16

With this expression I've been able to split after the ',

(?<=')

I've tried a lot of things, but I haven't been able to split the integers without screwing up the 'T99, Woody & Paul' part.

Is it even possible to do this?

Assuming the format remains as simple as you've described, the following will work:

(?<=^|,)('[^']*'|[^,]*)

which you can see at http://rubular.com/r/wuPzWXOK0w

If your commas within the single quotes are always followed by space and your commas outside single quotes are not, then you can simply use this:

String test = "210,'T99, Woody & Paul',1,'Geen omschrijving',5,3,7,'2008-04-12 21:00:00',16";
String[] splitted0 = test.split(",(?!\\s)");
for (String split: splitted0) {
    System.out.println(split);
}

Output:

210
'T99, Woody & Paul'
1
'Geen omschrijving'
5
3
7
'2008-04-12 21:00:00'
16

Alternate solution with an actual Pattern :

Pattern p = Pattern.compile("(?<=,|^)('?).+?\\1(?=,|$)");
Matcher m = p.matcher(test);
while (m.find()) {
    System.out.println(m.group());
}

Output:

210
'T99, Woody & Paul'
1
'Geen omschrijving'
5
3
7
'2008-04-12 21:00:00'
16

The second solution doesn't "care" about spaces, however they will be added to the output (you can always String.trim ).

Explanation for the non-trivial "alternate" solution:

(?<=,|^)

--> anything preceded by start of input or comma

('?)

--> optionally starts with '

.+?

--> any character, up to...

\\1

--> reference to group 1: ' or nothing

(?=,|$)

--> followed by , or end of input

This is a regex pattern I recently used in a project to split entries in a CSV file, where cells containing commas are protected with double quotes:

,(?=(?:[^"]*"[^"]*")*[^"]*$)

Swap double quotes for single quotes, and you'll get the same functionality on a different character. Working example in Java .

This pattern doesn't work for extra quotes inside a cell. Whether the pattern works on cells containing newlines depends on whether multiline search is enabled or not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM