I'm trying to remove everything between the two capital letters if there is a 'year' between them.
Here is what I have:
import re
string = 'Sep 09 2018*57.10*58.05*Sep 08 2018*56.76*54.91*Sep 07 2018*58.14*55.20*Sep 06 2018*55.07*54.66*Sep 06 2018*0.91 higher than last year, blablabla*Sep 05 2018*54.71*53.70'
string = re.sub(r'([A-Z].*year)(.*?)(?=[A-Z])', '*', string)
And, what I expect to get:
string = 'Sep 09 2018*57.10*58.05*Sep 08 2018*56.76*54.91*Sep 07 2018*58.14*55.20*Sep 06 2018*55.07*54.66*Sep 05 2018*54.71*53.70'
So, I "removed" everything up to the first capital letter before 'year' and everything until the next, which means '*Sep 06 2018*0.91 higher than last year, blablabla', but my code is starting from the begining, instead of from 'year' and then look backwards. I solved after 'year' already.
Appreciate if anybody can help me to fix this.
You may use
[A-Z][^A-Z]*year[^A-Z]*(?=[A-Z])
See the regex demo
Details
[AZ]
- an uppercase letter [^AZ]*
- 0+ chars other than uppercase letters year
- a word [^AZ]*
- 0+ chars other than uppercase letters (?=[AZ])
- immediately to the right of the current location, there should be an uppercase letter. In Python, use
string = re.sub(r'[A-Z][^A-Z]*year[^A-Z]*(?=[A-Z])', '', string)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.