简体   繁体   中英

error when replacing missing ')' using negative look ahead regex in python

I want to look ahead for missing ')' and add them with re.sub but I get strange results when using negative looka ahead:

a='D, M, departementsråd (fr.o.m. 2018-11-22 t.o.m. 2021-09-30 E, A, chef för Statens haverikommission (fr.o.m. 1997-07-01 t.o.m. 1997-09-07)'
re.sub(r'(t\.o\.m\.\s*\d{4}-\d{1,2}-\d{1,2})(?!\))',r'\1\)',a)

result:

D, M, departementsråd (fr.o.m. 2018-11-22 t.o.m. 2021-09-30\\) E, A, chef för Statens haverikommission (fr.o.m. 1997-07-01 t.o.m. 1997-09-0\\)7)

what I want:

D, M, departementsråd (fr.o.m. 2018-11-22 t.o.m. 2021-09-30) E, A, chef för Statens haverikommission (fr.o.m. 1997-07-01 t.o.m. 1997-09-0)

I want to add the missing ) in tom 2021-09-30 but it doesn't work.

You get that result because the \\d{1,2} leaves paths to explore using backtracking due to the {1,2}

This part \\d{1,2}(?!\\)) will match 1 or 2 digits asserting what is directly on the right is not ) which it can match for 0 in 07)

What you might do is use a word boundary \\d{1,2}\\b

t\.o\.m\.\s*\d{4}-\d{1,2}-\d{1,2}\b(?!\))

In the replacement you could use the full match instead of using group 1

\g<0>)

Regex demo | Python demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM