简体   繁体   中英

split string with delimiter that are not inside specific characters

I have a string in the in the following format

,"value","value2","3",("this is, a test"), "3"

How can I split by commas when they are not within parenthesis?

Edit: Sorry slight problem/correction, inside the parenthesis the format is actually

 ,"value","value2","3",(THIS IS THE FORMAT "AND QUOTES, INSIDE"), "3"

The quotes are already sufficient to delimit the comma, so you don't need parens as well. If you take out the parens, str_getcsv() will work on it just fine. If you don't have control of the source, you can strip them yourself:

$str = str_replace('",("', '","', $str);
$str = str_replace('"), "', '", "', $str);
print_r(str_getcsv($str))

Edit for updated question:

You're still ok as long as there are no unescaped parens in the file. Just convert close parens to open parens (since getcsv() can only use a single char for delimiters), and then use open paren as your quote character:

$str = str_replace(')', '(', $str);
print_r(str_getcsv($str, ',', '('));

Result:

Array
(
    [0] =>  
    [1] => "value"
    [2] => "value2"
    [3] => "3"
    [4] => THIS IS THE FORMAT "AND QUOTES, INSIDE"
    [5] =>  "3"
)

the above solutions work fine but i have one more

preg_match_all('@(,)?("|(\())(.+?)((?(3)\)|"))(,)?@',$str,$arr);

the output to this one is

Array ( [0] => Array ( [0] => ,"value", [1] => "value2", [2] => "3", [3] => ("this is, a test"), [4] => "3" )

[1] => Array
    (
        [0] => ,
        [1] => 
        [2] => 
        [3] => 
        [4] => 
    )

[2] => Array
    (
        [0] => "
        [1] => "
        [2] => "
        [3] => (
        [4] => "
    )

[3] => Array
    (
        [0] => 
        [1] => 
        [2] => 
        [3] => (
        [4] => 
    )

[4] => Array
    (
        [0] => value
        [1] => value2
        [2] => 3
        [3] => "this is, a test"
        [4] => 3
    )

[5] => Array
    (
        [0] => "
        [1] => "
        [2] => "
        [3] => )
        [4] => "
    )

[6] => Array
    (
        [0] => ,
        [1] => ,
        [2] => ,
        [3] => ,
        [4] => 
    )

)

so $arr[4] contains the matches

Consider this code:

$str = ',"value","value2","3",(THIS IS THE FORMAT \) "AND QUOTES, INSIDE"), "3"';
$regex = '#(\(.*?(?<!\\\)\))\s*,|,#';
$arr = preg_split( $regex, $str, 0, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY );
print_r($arr);

OUTPUT:

Array
(
    [0] => "value"
    [1] => "value2"
    [2] => "3"
    [3] => (THIS IS THE FORMAT \) "AND QUOTES, INSIDE")
    [4] =>  "3"
)

Here's a simple tokenizer that you can use to split the input into strings and other characters:

preg_match_all('/"(?:[^\\\\"]|\\.)*"|[^"]/', $input, $tokens)

If you want to parse the input, just iterate the tokens and do whatever syntax check you want. You can identify the strings by the quote at the begin and end of the token.

preg_match("/,?\"(.*?)\",?/", $myString, $result);

You can check the regex here

Edit: The only solution I can quickly think with escaped quotes is just replace them and add them again later

preg_match("/,?\"(.*?)\",?/", str_replace('\"', "'", $myString), $result);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM