简体   繁体   中英

Group list of numpy arrays based on shape. Pandas?

I have some instances of a class containing numpy arrays.

import numpy as np
import os.path as osp
class Obj():
  def_init__(self, file):
     self.file = file
     self.data = np.fromfile(file)
     self.basename = osp.basename(file)

I have a list of such objects, which I want to group by shape. I can do that using sort:

obj_list = [obj1, obj2, ..., objn]
obj_list.sort(key=lambda obj: obj.data.shape)

Now I have a second list, say obj_list_2: objects in obj_list_2 are initialized from different files but the resulting arrays have the same shape as in the first one (but not in the same order) and also the basename s are the same.

To clarify these are files loaded from different folders. In every folder I have the same files to which I applied different preprocessing)

If I sort them using the method shown above I can end up having

I want the two lists sorted based on shape and also aligned according to their basename

I though about doing first a sort based on the shape followed by one based on basename (of a function of it). Something like

obj_list.sort(key=lambda obj: obj.data.shape)
obj_list.sort(key=lambda obj: obj.basename)

However the second sort might screw the first one. They should be done together, somehow.

My final goal is to extract from the two lists the objects having the same shape and having the same basename

I tried with pandas but I'm not that familiar with it. First I align them based on the basename , then I create a list of lists and pass it to pandas.

import pandas as pd
obj_list_of_list = [obj_list1, obj_list2]
obj_df = pd.DataFrame.from_records(obj_list_of_list)

What is missing is to group them by shape and extract the different groups.

You can create a dictionary mapping (file, shape) to a list of objects using collections.defaultdict :

from collections import defaultdict

d = defaultdict(list)

obj_list = [obj1, obj2, ..., objn]

for obj in obj_list:
    d[(obj.filename, obj.data.shape)].append(obj)

Similarly, you can sort by shape only if you wish:

d_shape = defaultdict(list)

for obj in obj_list:
    d_shape[obj.data.shape].append(obj)

You can then access unique shapes via d_shape.keys() , and access a list of objects given a shape via d_shape[some_shape] . The benefit of such a solution is your complexity is O(n), while sorting will have higher complexity, eg O( n log n ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM