I have some instances of a class containing numpy arrays.
import numpy as np
import os.path as osp
class Obj():
def_init__(self, file):
self.file = file
self.data = np.fromfile(file)
self.basename = osp.basename(file)
I have a list of such objects, which I want to group by shape. I can do that using sort:
obj_list = [obj1, obj2, ..., objn]
obj_list.sort(key=lambda obj: obj.data.shape)
Now I have a second list, say obj_list_2: objects in obj_list_2 are initialized from different files but the resulting arrays have the same shape as in the first one (but not in the same order) and also the basename s are the same.
To clarify these are files loaded from different folders. In every folder I have the same files to which I applied different preprocessing)
If I sort them using the method shown above I can end up having
I want the two lists sorted based on shape and also aligned according to their basename
I though about doing first a sort based on the shape followed by one based on basename (of a function of it). Something like
obj_list.sort(key=lambda obj: obj.data.shape)
obj_list.sort(key=lambda obj: obj.basename)
However the second sort might screw the first one. They should be done together, somehow.
I tried with pandas but I'm not that familiar with it. First I align them based on the basename , then I create a list of lists and pass it to pandas.
import pandas as pd
obj_list_of_list = [obj_list1, obj_list2]
obj_df = pd.DataFrame.from_records(obj_list_of_list)
What is missing is to group them by shape and extract the different groups.
You can create a dictionary mapping (file, shape)
to a list
of objects using collections.defaultdict
:
from collections import defaultdict
d = defaultdict(list)
obj_list = [obj1, obj2, ..., objn]
for obj in obj_list:
d[(obj.filename, obj.data.shape)].append(obj)
Similarly, you can sort by shape only if you wish:
d_shape = defaultdict(list)
for obj in obj_list:
d_shape[obj.data.shape].append(obj)
You can then access unique shapes via d_shape.keys()
, and access a list of objects given a shape via d_shape[some_shape]
. The benefit of such a solution is your complexity is O(n), while sorting will have higher complexity, eg O( n log n ).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.