简体   繁体   中英

Python Regular Expression searching backwards

I need to extract a string from a directory like this:

my_new_string = "C:\\Users\\User\\code\\Python\\final\\mega_1237665428090192022_cts.ascii"
ID = '1237665428090192022'
m = re.match(r'.*(\b\w+%s)(?<!.{%d})' % (ID, -1), my_new_string)
if m: print m.group(1)

I need to extract 'mega' from the above my_new_string . At the moment the above just gets mega_1237665428090192022 so how do I get it to ignore the ID number?

To be honest I don't understand how these expressions work, even after consulting documentation. What does the r' do? And how does the ?<!.{%d} work?

edit: Thanks guys!

There are a couple of ways to do this, although I'm not sure you necessarily need a regex here. Here are some options:

>>> import os.path
>>> my_new_string = "C:\\Users\\User\\code\\Python\\final\\mega_1237665428090192022_cts.ascii"
>>> os.path.basename(my_new_string)
'mega_1237665428090192022_cts.ascii'
>>> basename = os.path.basename(my_new_string)
>>> basename.split('_')[0]
'mega'
>>> import re
>>> re.match(r'[A-Za-z]+', basename).group()
'mega'

I don't think you are looking for a negative lookahead assertion or a negative lookbehind assertion. If anything, you want to match if numbers DO follow. For example, something like this:

>>> re.match(r'.*?(?=[_\d])', basename).group()
'mega'

The r simply makes a raw string (so that you don't need to constantly escape backslashes, for example).

>>> m = re.match(r'.*\b(\w+)_(%s)(?<!.{%d})' % (ID, -1), my_new_string)
>>> m.groups()
('mega', '1237665428090192022')

>>> m.group(1)
'mega'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM