在 numpy 结构化数组中查找“行”的匹配子集

Question

I have data stored in a NumPy structured array where part of the information identifies various cases.我将数据存储在 NumPy 结构化数组中，其中部分信息标识了各种情况。 I would like to find the row that matches a given case.我想找到与给定案例匹配的行。 Eg, let's say I'm storing the name of a building, room number, and the number of chairs and tables in the room in a (2,) array.例如，假设我将建筑物的名称、房间号以及房间中的椅子和桌子的数量存储在一个 (2,) 数组中。 This would then look something like this:这看起来像这样：

import numpy as np

my_dtype = [('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (2,))]
room_info = np.array([('BLDG0', 12, [24, 6]),
                      ('BLDG1', 34, [32, 10]),
                      ('BLDG0', 14, [10, 20])],
                      dtype=my_dtype)

Now say that I want to find the row for building 'BLDG0' , room 14 .现在说我想找到建筑物'BLDG0'的行，房间14 。 Based on the answer to Finding a matching row in a numpy matrix , I tried根据Finding a matching row in a numpy matrix的答案，我尝试了

sub_fields = ['building', 'room']
matching_index, = np.where(room_info[sub_fields] == ('BLDG0', 14))

which would ideally result in [2] .理想情况下会导致[2] 。 However, this results in the following warning:但是，这会导致以下警告：

FutureWarning: elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future.

and returns an empty array.并返回一个空数组。 Is there a way to find the matching sub- row for a large set of data other than comparing each column separately and then finding the matching indices?除了分别比较每一列然后找到匹配的索引之外，有没有办法为大量数据找到匹配的子行？

I am using NumPy version 1.18.5 through miniconda and it doesn't look like I can safely update to a newer version within this environment.我正在通过 miniconda 使用 NumPy 版本 1.18.5，看起来我无法在此环境中安全地更新到更新的版本。 (Though I'm not sure if newer versions support this type of comparison) （虽然我不确定新版本是否支持这种类型的比较）

Answer 1

In [243]: my_dtype = [('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (
     ...: 2,))]
     ...: room_info = np.array([('BLDG0', 12, [24, 6]),
     ...:                       ('BLDG1', 34, [32, 10]),
     ...:                       ('BLDG0', 14, [10, 20])],
     ...:                       dtype=my_dtype)
In [244]: room_info
Out[244]: 
array([('BLDG0', 12, [24,  6]), ('BLDG1', 34, [32, 10]),
       ('BLDG0', 14, [10, 20])],
      dtype=[('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (2,))])

In [246]: room_info['building']
Out[246]: array(['BLDG0', 'BLDG1', 'BLDG0'], dtype='<U5')
In [247]: room_info['building']=='BLDG0'
Out[247]: array([ True, False,  True])

In [248]: room_info['room']==14
Out[248]: array([False, False,  True])

combine the two:结合两者：

In [249]: Out[247] & Out[248]
Out[249]: array([False, False,  True])

Use that as a boolean mask:将其用作 boolean 掩码：

In [250]: room_info[_]
Out[250]: 
array([('BLDG0', 14, [10, 20])],
      dtype=[('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (2,))])

and getting the index:并获取索引：

In [251]: np.nonzero(Out[247]&Out[248])
Out[251]: (array([2]),)

Looks like we can test both fields, using a properly constructed structured array:看起来我们可以使用正确构造的结构化数组来测试这两个字段：

In [254]: test=np.array(('BLDG0',14),dtype=my_dtype[:2])
In [255]: room_info[['building','room']]
Out[255]: 
array([('BLDG0', 12), ('BLDG1', 34), ('BLDG0', 14)],
      dtype={'names':['building','room'], 'formats':['<U5','<i8'], 'offsets':[0,20], 'itemsize':44})
In [256]: room_info[['building','room']]==test
Out[256]: array([False, False,  True])

在 numpy 结构化数组中查找“行”的匹配子集

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-08-21 00:36:12

在 numpy 结构化数组中查找“行”的匹配子集

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-08-21 00:36:12

解决方案1
1 已采纳 2021-08-21 00:36:12