简体   繁体   中英

Correct python Regular expression to create double dict

I have a list of files with names name_x01_y01_000.h5 or name_y01_x01_000.h5

What is the correct regular expression (or other method) to create a list of: file, x_ind, y_ind

So far i have this code:

name = 'S3_FullBrain_Mosaic_'
type = '.h5'

wildc = name + '*' + type
files = glob.glob(wildc)
files = np.asarray(files)

wildre = 'r\"' +name+'x(?P<x_ind>\d+)_y(?P<y_ind>\d+).+\"'
m = re.match(wildre,files)

Since the glob already ensures the correct filename and extension, the regex need only match the indices. re.search allows a partial match. .groupdict creates a dictionary with named groups as keys. The file key can be handled manually.

>>> file = 'S3_FullBrain_Mosaic_x02_y05_abcd.h5'
>>> result = re.search(r'x(?P<x_ind>\d+)_y(?P<y_ind>\d+)', file).groupdict()
>>> result
{'y_ind': '05', 'x_ind': '02'}
>>> result['file'] = file
>>> result
{'y_ind': '05', 'file': 'S3_FullBrain_Mosaic_x02_y05_abcd.h5', 'x_ind': '02'}

You can iterate over the files to produce the list of dicts. For this there's no need to create a numpy array, since I doubt you're going to do any heavy numerical calculations on the files list.

To handle both possible formats you will need to call re.search with two regexes. One will return None , the other a match on which you can use groupdict .

You could use re.findall

import re

names = ['name_x01_y01_000.h5', 'name_y01_x01_000.h5']
for name in names:
    matches = re.findall(r'_([xy])(\d+)(?=_)', name)
    d = {k: int(v) for k, v in matches}
    d['name'] = name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM