In a directory I have some files:
temperature_Resu05_les_spec_r0.0300.0
temperature_Resu05_les_spec_r0.0350.0
temperature_Resu05_les_spec_r0.0400.0
temperature_Resu05_les_spec_r0.0450.0
temperature_Resu06_les_spec_r0.0300.0
temperature_Resu06_les_spec_r0.0350.0
temperature_Resu06_les_spec_r0.0400.0
temperature_Resu06_les_spec_r0.0450.0
temperature_Resu07_les_spec_r0.0300.0
temperature_Resu07_les_spec_r0.0350.0
temperature_Resu07_les_spec_r0.0400.0
temperature_Resu07_les_spec_r0.0450.0
temperature_Resu08_les_spec_r0.0300.0
temperature_Resu08_les_spec_r0.0350.0
temperature_Resu08_les_spec_r0.0400.0
temperature_Resu08_les_spec_r0.0450.0
temperature_Resu09_les_spec_r0.0300.0
temperature_Resu09_les_spec_r0.0350.0
temperature_Resu09_les_spec_r0.0400.0
temperature_Resu09_les_spec_r0.0450.0
I need a list of all the files that have the same identifier XXXX as in _rXXXX
. For example one such list would be composed of
temperature_Resu05_les_spec_r0.0300.0
temperature_Resu06_les_spec_r0.0300.0
temperature_Resu07_les_spec_r0.0300.0
temperature_Resu08_les_spec_r0.0300.0
temperature_Resu09_les_spec_r0.0300.0
I don't know a priori what the XXXX values are going to be so I can't iterate through them and match like that. Im thinking this might best be handles with a regular expression. Any ideas?
Yes, regular expressions are a fun way to do it! It could look something like this:
results = {}
for fname in fnames:
id = re.search('.*_r(.*)', fname).group(1) # grabs whatever is after the final "_r" as an identifier
if id in results:
results[id] += fname
else:
results[id] = [fname]
The results will be stored in a dictionary, results
, indexed by the id.
I should add that this will work as long as all file names reliably have the _rXXXX structure. If there's any chance that a file name will not match that pattern, you will have to check for it and act accordingly.
No a regex is not the best way, you pattern is very straight forward, just str.rsplit on the _r
and use the right element of the split as the key to group the data with. A defaultdict will do the grouping efficiently:
from collections import defaultdict
with open("yourfile") as f:
groups = defaultdict(list)
for line in f:
groups[line.rsplit("_r",1)[1]].append(line.rstrip())
from pprint import pprint as pp
pp(groups.values())
Which for your sample will give you:
[['temperature_Resu09_les_spec_r0.0450.0'],
['temperature_Resu05_les_spec_r0.0300.0',
'temperature_Resu06_les_spec_r0.0300.0',
'temperature_Resu07_les_spec_r0.0300.0',
'temperature_Resu08_les_spec_r0.0300.0',
'temperature_Resu09_les_spec_r0.0300.0'],
['temperature_Resu05_les_spec_r0.0400.0',
'temperature_Resu06_les_spec_r0.0400.0',
'temperature_Resu07_les_spec_r0.0400.0',
'temperature_Resu08_les_spec_r0.0400.0',
'temperature_Resu09_les_spec_r0.0400.0'],
['temperature_Resu05_les_spec_r0.0450.0',
'temperature_Resu06_les_spec_r0.0450.0',
'temperature_Resu07_les_spec_r0.0450.0',
'temperature_Resu08_les_spec_r0.0450.0'],
['temperature_Resu05_les_spec_r0.0350.0',
'temperature_Resu06_les_spec_r0.0350.0',
'temperature_Resu07_les_spec_r0.0350.0',
'temperature_Resu08_les_spec_r0.0350.0',
'temperature_Resu09_les_spec_r0.0350.0']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.