[英]Group list of numpy arrays based on shape. Pandas?
I have some instances of a class containing numpy arrays. 我有一些包含numpy数组的类的实例。
import numpy as np
import os.path as osp
class Obj():
def_init__(self, file):
self.file = file
self.data = np.fromfile(file)
self.basename = osp.basename(file)
I have a list of such objects, which I want to group by shape. 我有一个这样的对象列表,我想按形状分组。 I can do that using sort:
我可以使用sort来做到这一点:
obj_list = [obj1, obj2, ..., objn]
obj_list.sort(key=lambda obj: obj.data.shape)
Now I have a second list, say obj_list_2: objects in obj_list_2 are initialized from different files but the resulting arrays have the same shape as in the first one (but not in the same order) and also the basename s are the same. 现在我有第二个列表,比如obj_list_2:obj_list_2中的对象是从不同的文件初始化的,但结果数组的形状与第一个相同(但顺序不同), 基本名称也相同。
To clarify these are files loaded from different folders. 澄清这些是从不同文件夹加载的文件。 In every folder I have the same files to which I applied different preprocessing)
在每个文件夹中,我都有相同的文件,我应用了不同的预处理)
If I sort them using the method shown above I can end up having 如果我使用上面显示的方法对它们进行排序,我最终会得到它
I want the two lists sorted based on shape and also aligned according to their basename 我希望这两个列表根据形状排序,并根据它们的基本名称进行对齐
I though about doing first a sort based on the shape followed by one based on basename (of a function of it). 我想先做一个基于形状的排序,后面跟一个基于basename (它的函数)的形状。 Something like
就像是
obj_list.sort(key=lambda obj: obj.data.shape)
obj_list.sort(key=lambda obj: obj.basename)
However the second sort might screw the first one. 然而,第二种可能会使第一种螺旋。 They should be done together, somehow.
它们应该以某种方式一起完成。
I tried with pandas but I'm not that familiar with it. 我尝试过大熊猫,但我对它并不熟悉。 First I align them based on the basename , then I create a list of lists and pass it to pandas.
首先,我根据基本名称对齐它们,然后创建一个列表列表并将其传递给pandas。
import pandas as pd
obj_list_of_list = [obj_list1, obj_list2]
obj_df = pd.DataFrame.from_records(obj_list_of_list)
What is missing is to group them by shape and extract the different groups. 缺少的是按形状对它们进行分组并提取不同的组。
You can create a dictionary mapping (file, shape)
to a list
of objects using collections.defaultdict
: 您可以使用
collections.defaultdict
创建字典映射(file, shape)
到对象list
:
from collections import defaultdict
d = defaultdict(list)
obj_list = [obj1, obj2, ..., objn]
for obj in obj_list:
d[(obj.filename, obj.data.shape)].append(obj)
Similarly, you can sort by shape only if you wish: 同样, 只有在您希望时才能按形状排序:
d_shape = defaultdict(list)
for obj in obj_list:
d_shape[obj.data.shape].append(obj)
You can then access unique shapes via d_shape.keys()
, and access a list of objects given a shape via d_shape[some_shape]
. 然后,您可以通过
d_shape.keys()
访问唯一形状,并通过d_shape[some_shape]
访问给定形状的对象列表。 The benefit of such a solution is your complexity is O(n), while sorting will have higher complexity, eg O( n log n ). 这种解决方案的好处是您的复杂性是O(n),而排序将具有更高的复杂性,例如O( n log n )。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.