简体   繁体   中英

Python regular expression on single quoted string with escaped single quotes

Suppose that we have some input like this (it's an example, no matter if it makes sense or not):

data = "(((column_1 + 7.45) * 3) <>    column_2 - ('string\'1' / 2))"

Well, I need to match a string, that starts and ends with ' and may contain escaped single quotes as example above, using Python re module. So the result should be string\\'1 . How can we achieve it?

EDIT: I am using the PLY library and the usage should be

def t_leftOperand_arithmetic_rightOperand_STRING(self, t):
    r'<regex>'
    t.lexer.pop_state()
    return t

I believe you have to account for the escape being escaped as well.

For that, you'd need '[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'


Input

'''Set 1 - this
is another
mul\'tiline
string'''
'''Set 2 - this
is' a\\nother
mul\'''tiline
st''ring'''

Benchmark:

Regex1:   '[^'\\]*(?:\\[\S\s][^'\\]*)*'
Options:  < none >
Completed iterations:   400  /  400     ( x 1000 )
Matches found per iteration:   9
Elapsed Time:    5.00 s,   4995.27 ms,   4995267 µs


Regex2:   '(?:[^'\\]|\\.)*'
Options:  < s >
Completed iterations:   400  /  400     ( x 1000 )
Matches found per iteration:   9
Elapsed Time:    7.00 s,   7000.68 ms,   7000680 µs

Additional regex (For a test only. As @ridgerunner says this could cause a backtracking problem)

Regex2:   '(?:[^'\\]+|\\.)*'
Options:  < s >
Completed iterations:   400  /  400     ( x 1000 )
Matches found per iteration:   9
Elapsed Time:    5.45 s,   5449.72 ms,   5449716 µs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM