简体   繁体   中英

Python reference to regex in parentheses

I have a text file that needs to have the letter 't' removed if it is not immediately preceded by a number.

I am trying to do this using re.sub and I have this:

f=open('File.txt').read()
g=f
g=re.sub('([^0-9])t','',g)

This identifies the letters to be removed correctly but also removes the preceding character. How can I refer to the parenthesized regex in the replacement String? Thanks!

Use a lookbehind (or negative lookbehind) instead.

g=re.sub('(?<=[^0-9])t','',g)

or

g=re.sub('(?<![0-9])t','',g)

Three options:

g=re.sub('([^0-9])t','\\1',g)

or

g=re.sub('(?<=[^0-9])t','',g)

or

g=re.sub('(?<![0-9])t','',g)

The first option is what you are looking for, a backreference to the captured string. \\\\1 will refer to the first captured group.

Lookarounds don't consume characters, so you don't need to replace them back. Here, I have used a positive lookbehind for the first one and a negative lookbehind for the second one. Those don't consume the characters within their brackets, so you are not taking the [^0-9] or [0-9] in the replacement. It might be better to use those since it prevents overlapping matches.

The positive lookbehind makes sure that t has a non-digit character before it. The negative lookbehind makes sure that t does not have a digit character before it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM