简体   繁体   中英

Regex for word not between parentheses

Given a string, match everything that occurs after the first occurrence of a word. The word must not appear anywhere inside a pair of parentheses, but other words may. For example:

SELECT
t1.col1,
(SELECT t2.col1 FROM table2 t2
    WHERE t2.id IN(SELECT * FROM table5 WHERE id = t2.id)
) AS alias1,
t1.col2
----------
FROM
table1 t1,
(SELECT id FROM table3 t3 WHERE t3.id = t1.table3_id) t3,
table4 t4

I'm looking for everything AFTER the dotted line - specifically, everything after the 1st appearance of the word FROM which does not appear anywhere within a pair of parentheses

If Regex won't do, I'll craft a PHP statement to parse. I'm having a tough time with that as well, tho! I guess to do this, I would have to tokenize the string by word AND by parentheses?

I think a regex might not be the best solution here, as they can be notoriously difficult (or impossible) when nested parens are involved.

I also think looping through each character is not the best approach, as it will result in a lot of unnecessary loops.

I think this is best approach:

Find each occurance of a given string and count the number of parens before that occurance. If the number of opening parens is equal to the number of closing parens, then you have the correct match. This will result is way less looping and you're only checking what you really mean to check.

I made a function findWord that takes this approach. It works with your example where $in is your SQL statement and $search is 'FROM' .

function findWord( $in, $search ) {

    if( strpos($in, $search) === 0 ) return $in;

    $before = '';
    while( strpos($in, $search, 1) ) {
        $i = strpos($in, $search, 1);
        $before .= substr($in, 0, $i);
        $in = substr($in, $i);

        $count = count_chars($before);

        if( $count[40] == $count[41] )
            return $in;
    }

    return false;
}

I'm going with a programmatic approach unless someone has a better answer.

/**
 * Find the portion of the SQL statement occurring after
 * the first occurrence of the word 'FROM' (which itself
 * does not appear within parens)
 */
public static function sql_after_from($sql) {
    $arr = str_split($sql);
    $indent = 0;
    $out = '';
    $start = 0;
    $len = count($arr);
    for($x=0; $x < $len; $x++) {
        $c = $arr[$x]; //current character
        if($c == '(') $indent++;
        if($c == ')') $indent--;
        $out .= $arr[$x];
        //do the last 4 letters spell FROM?
        if(substr($out, $x-3, $x) == 'FROM') {
            if($indent == 0) { //not anywhere within parens
                $start = $x+2;
                break; //go no further 
            }
        }
    }
    //everything after the first occurrence of FROM
    return substr($sql, $start);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM