压缩numpy数组的有效方法（python）

Question

我正在寻找一种压缩numpy数组的有效方法。 我有一个像这样的数组： dtype=[(name, (np.str_,8), (job, (np.str_,8), (income, np.uint32)] （我最喜欢的例子）。

如果我正在做这样的事情： my_array.compress(my_array['income'] > 10000)我得到的新数组只有收入> 10000，而且很快。

但是，如果我想过滤列表中的工作，那是行不通的！

my__array.compress(m_y_array['job'] in ['this', 'that'])

错误：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

所以我必须做这样的事情：

np.array([x for x in my_array if x['job'] in ['this', 'that'])

这既丑陋又低效！

您有提高效率的想法吗？

Answer 1

它并不像您想要的那样好，但是我认为您可以做到：

mask = my_array['job'] == 'this'
for condition in ['that', 'other']:
  mask = numpy.logical_or(mask,my_array['job'] == condition)
selected_array = my_array[mask]

Answer 2

压缩numpy数组的最佳方法是使用pytables。 在处理大量数字数据时，这是事实上的标准。

import tables as t
hdf5_file = t.openFile('outfile.hdf5')
hdf5_file.createArray ......
hdf5_file.close()

Answer 3

如果您正在寻找仅适用于numpy的解决方案，那么我认为您不会得到。 尽管如此，尽管它在幕后做了很多工作，但请考虑一下表格包装是否可能以一种不太“丑陋”的方式完成您想要的事情。 我不确定如果不自己编写C扩展名，您将获得更多的“效率”。

顺便说一句，我认为对于几乎任何实际案例，这既高效又足够漂亮。

my_array.compress([x in ['this', 'that'] for x in my_array['job']])

作为使此方法不那么丑陋和更有效的一个额外步骤，您大概不会在中间使用硬编码列表，所以我将使用集合，因为如果列表包含多个列表，则搜索比列表快得多项目：

job_set = set(['this', 'that'])
my_array.compress([x in job_set for x in my_array['job']])

如果您认为效率不够高，建议您进行基准测试，这样您将有信心在明智地花费时间的同时尝试提高效率。

压缩numpy数组的有效方法（python）

问题描述

3 个解决方案

解决方案1
1 2009-12-09 00:44:11

解决方案2
1 2010-10-21 13:26:22

解决方案3
0 2009-12-10 14:51:17

压缩numpy数组的有效方法（python）

问题描述

3 个解决方案

解决方案1 1 2009-12-09 00:44:11

解决方案2 1 2010-10-21 13:26:22

解决方案3 0 2009-12-10 14:51:17

解决方案1
1 2009-12-09 00:44:11

解决方案2
1 2010-10-21 13:26:22

解决方案3
0 2009-12-10 14:51:17