简体   繁体   中英

REGEX: Splitting by commas that are not in single quotes, allowing for escaped quotes

I am looking for a regular expression using preg_match_all in PHP 5 that would allow me to split a string by commas, so long as the commas do not exist inside single quotes, allowing for escaped single quotes. Example data would be:

(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')

This should produce a match that looks like this:

(some_array

'some, string goes here'

'another_string'

'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'

 anonquotedstring

 83448545

 1210597346 + '000'

 1241722133 + '000')

I've tried many, many regexes... My current one looks like this, although it doesn't match 100% correctly. (It still splits some commas inside single quotes.)

"/'(.*?)(?<!(?<!\\\)\\\)'|[^,]+/"

Have you tried str_getcsv ? It does exactly what you need without a regular expression.

$result = str_getcsv($str, ",", "'");

You can even implement this method in PHP versions older than 5.3, mapping to fgetcsv with this snippet from a comment in the docs:

if (!function_exists('str_getcsv')) {

    function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = null, $eol = null) {
        $temp = fopen("php://memory", "rw");
        fwrite($temp, $input);
        fseek($temp, 0);
        $r = fgetcsv($temp, 4096, $delimiter, $enclosure);
        fclose($temp);
        return $r;
    }

}

In PHP 5.3 onwards you can save yourself that pain with str_getcsv

 $data=str_getcsv($input, ",", "'");

To take your example...

$input=<<<STR
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but it can\'t split on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
STR;

$data=str_getcsv($input, ",", "'");
print_r($data);

Outputs this

Array
(
    [0] => (some_array
    [1] => some, string goes here
    [2] => another_string
    [3] => this string may contain "double quotes" but it can\'t split on escaped single quotes
    [4] => anonquotedstring
    [5] => 83448545
    [6] => 1210597346 + '000'
    [7] => 1241722133 + '000')
)

With some look-behind, you can get something close to what you want :

$test = "(some_array, 'some, string goes here','another_string','this string may contain \"double quotes\" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')";
preg_match_all('`
(?:[^,\']|
   \'((?<=\\\\)\'|[^\'])*\')*
`x', $test, $result);
print_r($result);

Gives you this result :

Array
(
    [0] => Array
        (
            [0] => (some_array
            [1] => 
            [2] =>  'some, string goes here'
            [3] => 
            [4] => 'another_string'
            [5] => 
            [6] => 'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
            [7] => 
            [8] =>  anonquotedstring
            [9] => 
            [10] =>  83448545
            [11] => 
            [12] =>  1210597346 + '000'
            [13] => 
            [14] =>  1241722133 + '000')
            [15] => 
        )

    [1] => Array
        (
            [0] => 
            [1] => 
            [2] => e
            [3] => 
            [4] => g
            [5] => 
            [6] => s
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 0
            [13] => 
            [14] => 0
            [15] => 
        )

)

I second the use of a CSV parser here, that's what they are there for.

If you're stuck with regex, you could use

preg_match_all(
    '/\s*"    # either match " (optional preceding whitespace),
     (?:\\\\. # followed either by an escaped character
     |        # or
     [^"]     # any character except "
     )*       # any number of times,
    "\s*      # followed by " (and optional whitespace).
    |         # Or: do the same thing for single-quoted strings.
    \s*\'(?:\\\\.|[^\'])*\'\s*
    |         # Or:
    [^,]*     # match anything except commas (i.e. any remaining unquoted strings)
    /x', 
    $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

But, as you can see, this is ugly and hard to maintain. Use the right tool for the job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM