I have a list of files with names name_x01_y01_000.h5 or name_y01_x01_000.h5
What is the correct regular expression (or other method) to create a list of: file, x_ind, y_ind
So far i have this code:
name = 'S3_FullBrain_Mosaic_'
type = '.h5'
wildc = name + '*' + type
files = glob.glob(wildc)
files = np.asarray(files)
wildre = 'r\"' +name+'x(?P<x_ind>\d+)_y(?P<y_ind>\d+).+\"'
m = re.match(wildre,files)
Since the glob already ensures the correct filename and extension, the regex need only match the indices. re.search
allows a partial match. .groupdict
creates a dictionary with named groups as keys. The file
key can be handled manually.
>>> file = 'S3_FullBrain_Mosaic_x02_y05_abcd.h5'
>>> result = re.search(r'x(?P<x_ind>\d+)_y(?P<y_ind>\d+)', file).groupdict()
>>> result
{'y_ind': '05', 'x_ind': '02'}
>>> result['file'] = file
>>> result
{'y_ind': '05', 'file': 'S3_FullBrain_Mosaic_x02_y05_abcd.h5', 'x_ind': '02'}
You can iterate over the files to produce the list of dicts. For this there's no need to create a numpy array, since I doubt you're going to do any heavy numerical calculations on the files
list.
To handle both possible formats you will need to call re.search
with two regexes. One will return None
, the other a match on which you can use groupdict
.
You could use re.findall
import re
names = ['name_x01_y01_000.h5', 'name_y01_x01_000.h5']
for name in names:
matches = re.findall(r'_([xy])(\d+)(?=_)', name)
d = {k: int(v) for k, v in matches}
d['name'] = name
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.