简体   繁体   中英

Tcl split regex over multiple lines

I have a long RE to match dates in multiple files and I would like to split it out over multiple lines so it is easier to read and update. I am setting it as a variable and then calling that variable in the regex statement.

set ::eval::regexdate { \d[\/\.-]\d{2}[\/\.-]\d{4}|\d{2}[\/\.-]\d{2}[\/\.-]\d{4}|\d{4}[\/\.-]\d{2}[\/\.-]\d{2}|(([12]\d|3[01])|([12]\d|3[01])(th|nd|rd|st))\s(January|February|March|April|May|June|July|August|September|October|November|December)\s\d{4}|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[\/\.-]\d{2}[\/\.-]\d{4} }

I am then calling it with the following regexp line...

if {[regexp "($::eval::regexdate)" $linefromfile all date]} {
    Do something...
 }

This all works fine if the RE is set as one long string, but if I try to break it out over multiple lines using (?x) as outlined in this post.

regexp pattern across multiple lines

set ::eval::regexdate {(?x)
    \d[\/\.-]\d{2}[\/\.-]\d{4}|
    \d{2}[\/\.-]\d{2}[\/\.-]\d{4}|
    \d{4}[\/\.-]\d{2}[\/\.-]\d{2}|
    (([12]\d|3[01])|([12]\d|3[01])(th|nd|rd|st))\s(January|February|March|April|May|June|July|August|September|October|November|December)\s\d{4}|
    (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[\/\.-]\d{2}[\/\.-]\d{4}
}

I get the following error...

 `couldn't compile regular expression pattern: quantifier operand invalid.`

I am not sure why this is happening, my understanding is that using (?x) ignores all white space and comments so it should just stitch the lines back together to create the one long RE, no? Are the "|" operands causing an issue in the way that I have split the RE up?

Any help would be greatly appreciated in figuring out why it won't work when using (?x).

Thanks

The problem is the way you use the regexdate variable in your regxep command. As the post you reference indicates, (?x) should be at the start of the regular expression. However, by using "($::eval::regexdate)" you put parentheses around it, effectively making the expression ((?x)…) . Putting parentheses around the complete regular expression is not very useful, as the regexp command will already put the full match in the first variable handed to it.

So, either omit the parentheses and use the complete match as the date:

regexp $::eval::regexdate $linefromfile date

Or move the (?x) to the call:

set ::eval::regexdate {
    \d[\/\.-]\d{2}[\/\.-]\d{4}|
    \d{2}[\/\.-]\d{2}[\/\.-]\d{4}|
    \d{4}[\/\.-]\d{2}[\/\.-]\d{2}|
    (([12]\d|3[01])|([12]\d|3[01])(th|nd|rd|st))\s(January|February|March|April|May|June|July|August|September|October|November|December)\s\d{4}|
    (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[\/\.-]\d{2}[\/\.-]\d{4}
}

if {[regexp "(?x)($::eval::regexdate)" $linefromfile all date]} {
    Do something...
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM