I am looking for a regular expression using preg_match_all in PHP 5 that would allow me to split a string by commas, so long as the commas do not exist inside single quotes, allowing for escaped single quotes. Example data would be:
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
This should produce a match that looks like this:
(some_array
'some, string goes here'
'another_string'
'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
anonquotedstring
83448545
1210597346 + '000'
1241722133 + '000')
I've tried many, many regexes... My current one looks like this, although it doesn't match 100% correctly. (It still splits some commas inside single quotes.)
"/'(.*?)(?<!(?<!\\\)\\\)'|[^,]+/"
Have you tried str_getcsv
? It does exactly what you need without a regular expression.
$result = str_getcsv($str, ",", "'");
You can even implement this method in PHP versions older than 5.3, mapping to fgetcsv
with this snippet from a comment in the docs:
if (!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = null, $eol = null) {
$temp = fopen("php://memory", "rw");
fwrite($temp, $input);
fseek($temp, 0);
$r = fgetcsv($temp, 4096, $delimiter, $enclosure);
fclose($temp);
return $r;
}
}
In PHP 5.3 onwards you can save yourself that pain with str_getcsv
$data=str_getcsv($input, ",", "'");
To take your example...
$input=<<<STR
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but it can\'t split on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
STR;
$data=str_getcsv($input, ",", "'");
print_r($data);
Outputs this
Array
(
[0] => (some_array
[1] => some, string goes here
[2] => another_string
[3] => this string may contain "double quotes" but it can\'t split on escaped single quotes
[4] => anonquotedstring
[5] => 83448545
[6] => 1210597346 + '000'
[7] => 1241722133 + '000')
)
With some look-behind, you can get something close to what you want :
$test = "(some_array, 'some, string goes here','another_string','this string may contain \"double quotes\" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')";
preg_match_all('`
(?:[^,\']|
\'((?<=\\\\)\'|[^\'])*\')*
`x', $test, $result);
print_r($result);
Gives you this result :
Array
(
[0] => Array
(
[0] => (some_array
[1] =>
[2] => 'some, string goes here'
[3] =>
[4] => 'another_string'
[5] =>
[6] => 'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
[7] =>
[8] => anonquotedstring
[9] =>
[10] => 83448545
[11] =>
[12] => 1210597346 + '000'
[13] =>
[14] => 1241722133 + '000')
[15] =>
)
[1] => Array
(
[0] =>
[1] =>
[2] => e
[3] =>
[4] => g
[5] =>
[6] => s
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] => 0
[13] =>
[14] => 0
[15] =>
)
)
I second the use of a CSV parser here, that's what they are there for.
If you're stuck with regex, you could use
preg_match_all(
'/\s*" # either match " (optional preceding whitespace),
(?:\\\\. # followed either by an escaped character
| # or
[^"] # any character except "
)* # any number of times,
"\s* # followed by " (and optional whitespace).
| # Or: do the same thing for single-quoted strings.
\s*\'(?:\\\\.|[^\'])*\'\s*
| # Or:
[^,]* # match anything except commas (i.e. any remaining unquoted strings)
/x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
But, as you can see, this is ugly and hard to maintain. Use the right tool for the job.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.