简体   繁体   中英

Regular expression substitution in Python

I have a string

line = "haha (as jfeoiwf) avsrv arv (as qwefo) afneoifew"

From this I want to remove all instances of "(as...)" using some regular expression. I want the output to look like

line = "haha avsrv arv afneoifew"

I tried:

line = re.sub(r'\(+as .*\)','',line)

But this yields:

line = "haha afneoifew"

To get non-greedy behaviour , you have to use *? instead of * , ie re.sub(r'\\(+as .*?\\) ','',line) . To get the desired string, you also have to add a space, ie re.sub(r'\\(+as .*?\\) ','',line) .

The problem is that your regexp matches this whole group : (as jfeoiwf) avsrv arv (as qwefo) , hence your result.

You can use :

>>> import re
>>> line = "haha (as jfeoiwf) avsrv arv (as qwefo) afneoifew"
>>> line = re.sub(r'\(+as [a-zA-Z]*\)','',line)
>>> line
'haha  avsrv arv  afneoifew'

Hope it'll be helpful.

You were very close. You need to use lazy quantifier '?' after .*. In default it will try to capture biggest group it possibly can. With lazy quantifier it'll actually try to match smallest possible groups.

line = re.sub(r'\(+as .*?\) ','',line)

尝试:

re.sub(u".\(as \w+\).", ' ',line)

尝试:

re.sub(r'\(as[^\)]*\)', '', line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM