简体   繁体   English

基于形状的numpy数组的组列表。 熊猫?

[英]Group list of numpy arrays based on shape. Pandas?

I have some instances of a class containing numpy arrays. 我有一些包含numpy数组的类的实例。

import numpy as np
import os.path as osp
class Obj():
  def_init__(self, file):
     self.file = file
     self.data = np.fromfile(file)
     self.basename = osp.basename(file)

I have a list of such objects, which I want to group by shape. 我有一个这样的对象列表,我想按形状分组。 I can do that using sort: 我可以使用sort来做到这一点:

obj_list = [obj1, obj2, ..., objn]
obj_list.sort(key=lambda obj: obj.data.shape)

Now I have a second list, say obj_list_2: objects in obj_list_2 are initialized from different files but the resulting arrays have the same shape as in the first one (but not in the same order) and also the basename s are the same. 现在我有第二个列表,比如obj_list_2:obj_list_2中的对象是从不同的文件初始化的,但结果数组的形状与第一个相同(但顺序不同), 基本名称也相同。

To clarify these are files loaded from different folders. 澄清这些是从不同文件夹加载的文件。 In every folder I have the same files to which I applied different preprocessing) 在每个文件夹中,我都有相同的文件,我应用了不同的预处理)

If I sort them using the method shown above I can end up having 如果我使用上面显示的方法对它们进行排序,我最终会得到它

I want the two lists sorted based on shape and also aligned according to their basename 我希望这两个列表根据形状排序,并根据它们的基本名称进行对齐

I though about doing first a sort based on the shape followed by one based on basename (of a function of it). 我想先做一个基于形状的排序,后面跟一个基于basename (它的函数)的形状。 Something like 就像是

obj_list.sort(key=lambda obj: obj.data.shape)
obj_list.sort(key=lambda obj: obj.basename)

However the second sort might screw the first one. 然而,第二种可能会使第一种螺旋。 They should be done together, somehow. 它们应该以某种方式一起完成。

My final goal is to extract from the two lists the objects having the same shape and having the same basename 我的最终目标是从两个列表中提取具有相同形状且具有相同基本名称的对象

I tried with pandas but I'm not that familiar with it. 我尝试过大熊猫,但我对它并不熟悉。 First I align them based on the basename , then I create a list of lists and pass it to pandas. 首先,我根据基本名称对齐它们,然后创建一个列表列表并将其传递给pandas。

import pandas as pd
obj_list_of_list = [obj_list1, obj_list2]
obj_df = pd.DataFrame.from_records(obj_list_of_list)

What is missing is to group them by shape and extract the different groups. 缺少的是按形状对它们进行分组并提取不同的组。

You can create a dictionary mapping (file, shape) to a list of objects using collections.defaultdict : 您可以使用collections.defaultdict创建字典映射(file, shape)到对象list

from collections import defaultdict

d = defaultdict(list)

obj_list = [obj1, obj2, ..., objn]

for obj in obj_list:
    d[(obj.filename, obj.data.shape)].append(obj)

Similarly, you can sort by shape only if you wish: 同样, 只有在您希望时才能按形状排序:

d_shape = defaultdict(list)

for obj in obj_list:
    d_shape[obj.data.shape].append(obj)

You can then access unique shapes via d_shape.keys() , and access a list of objects given a shape via d_shape[some_shape] . 然后,您可以通过d_shape.keys()访问唯一形状,并通过d_shape[some_shape]访问给定形状的对象列表。 The benefit of such a solution is your complexity is O(n), while sorting will have higher complexity, eg O( n log n ). 这种解决方案的好处是您的复杂性是O(n),而排序将具有更高的复杂性,例如O( n log n )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM