简体   繁体   中英

C++ regular expression to split string into an array

I am trying to write a handler to extract parameters from a function, where the parameters are between () and the parameters will be delimited by a command ',' parameters may also be defined as arrays which are comma delimited and wrapped in [].

Examples of what I'm trying to decode:

    testA(aaaa, [bbbb,cccc,dddd], eeee)

or

    testB([aaaa,bbbb,cccc], dddd, [eeee,ffff])

Basically any combination and any number of parameters, what I want from these would be a list containing:

for testA:

    0 : aaaa
    1 : [bbbb,cccc,dddd]
    2 : eeee

for testB:

    0 : [aaaa,bbbb,cccc]
    1 : dddd
    2 : [eeee,ffff]

I'm trying to write a parser that will give me the same, but a regular expression to do this would be preferred.

This is my coded solution which works written in C++ for Qt5.6:

    int intOpSB, intPStart;
    //Analyse and count the parameters
    intOpSB = intPStart = 0;
    for( int p=0; p<strParameters.length(); p++ ) {
        const QChar qc = strParameters.at(p);

        if ( qc == clsXMLnode::mcucOpenSquareBracket ) {
            intOpSB++;
            continue;
        } else if ( qc == clsXMLnode::mcucCloseSquareBracket ) {
            intOpSB--;
            continue;
        }
        if ( (intOpSB == 0 && qc == clsXMLnode::mcucArrayDelimiter)
        || p == strParameters.length() - 1 ) {
            if ( strParameters.at(intPStart) == clsXMLnode::mcucArrayDelimiter ) {
    //Skip over the opening bracket or array delimiter
                intPStart++;
            }
            if ( intPStart > p ) {
                continue;
            }
            int intEnd = p;
            while( true ) {
                if ( intEnd > 0 && (strParameters.at(intEnd) == clsXMLnode::mcucArrayDelimiter) ) {
    //We don't want the delimiter or the closing square bracket in the parameter
                    intEnd--;
                } else {
                    break;
                }
            }
            if ( intEnd > intPStart ) {
                QString strParameter = strParameters.mid(intPStart, intEnd - intPStart + 1);
    //Update remaining parameters, skipping the parameter and any delimiter
                strParameters = strParameters.mid(strParameter.length() + 1);
    //Remove any quotes
                strParameter = strParameter.replace("\"", "");
                strParameter = strParameter.replace("\'", "");
    //Add the parameter
                mslstParameters.append(strParameter);
    //Reset parameter start
                intPStart = 0;
                p = -1;
            }
        }
    }

References:

    mcucOpenSquareBracket is a constant defined as '['
    mcucCloseSquareBracket is a constant defined as ']'
    mcucArrayDelimiter is a constant defined as ','
    mslstParameters is a member defined as QStringList
auto term = "(?:[^,<]*)"s;
auto chain = "(?:(?:"+term+",)*"+term+")"s;

auto clause = "(?:(?:"+term+")|(?:<" + chain + ">))"s;

auto re_str = "^(?:("+term+")|(?:<("+chain+")>))" "(?:|,((?:"+clause+",)*"+clause+"))";

re_str takes your string, and splits off the first term or chain from the tail.

It returns up to 3 sub-matches. The first is a lone term. The second is a comma-delimited chain of terms. The third is the rest of the string after the , .

The tail is going to be empty, or another string that can be parsed using the above regular expression.

Chains of terms can be parsed by the same regular expression.

live example .

I matched <> delimited chains of terms, not [] , because I got bored of \\\\ s.

You also want to discard whitespace around clauses. I omitted that, it should be easy to stitch in.

I have this regex that should work.

\[.*?\]|([^,\s]+)

See here at Regexr

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM