将稀疏格式转换为数组numpy

Question

Suppose you have a generator that returns the rows of a table. 假设您有一个返回表行的生成器。 Something you could use like this: 您可以这样使用：

for (labels, value) in rows:
    pass

"labels" is length n and say it's all strings for simplicity. “ labels”的长度为n，为了简单起见，它全部都是字符串。 "value" is something numeric like a float. “值”是一些数字，例如浮点数。

Is there a fast, best or built-in way to hash the labels and end up with an n-dimensional array of values and n lists telling you how to map the labels values to indices? 是否有一种快速，最佳或内置的方式对标签进行哈希处理并以n维值数组和n个列表结尾，从而告诉您如何将标签值映射到索引？ I think you could maybe store this in a recarray? 我认为您可以将其存储在一个rearray中吗？ I do this all the time but it always ends up being a bit of throw away code. 我一直在这样做，但最终总是会丢掉一些代码。 I'd like to find or create something more reusable. 我想找到或创建更多可重用的东西。

I would be very happy with (('here', 'there', 'nowhere'), 1.234) being mapped to either results['here']['there']['nowhere'] = 1.234 or results[12,3,45] = 1.234 (and having the corresponding lists given the labels down the axes in each dimension.) 如果将(('here', 'there', 'nowhere'), 1.234)映射到results['here']['there']['nowhere'] = 1.234或results[12,3,45] = 1.234 （并且在每个维度的轴上都给相应的列表指定了标签）。

I could probably write generator conversion to sparse format and use scipy, but this seems like a nasty way to do something relatively simple. 我可能可以将生成器转换编写为稀疏格式并使用scipy，但这似乎是做相对简单的事情的一种讨厌的方式。

I see a lot of similar sounding questions but none which exactly answer this question. 我看到很多类似的听起来的问题，但没有一个能完全回答这个问题。 Maybe I'm missing a search phrase. 也许我缺少搜索短语。

Answer 1

You could try using a structured array: 您可以尝试使用结构化数组：

result = np.fromiter(your_generator, dtype=[('labels', '|S10'), ('value', float)])

You'll be able to retrieve a ndarray of labels as result['labels'] (and the values as result['value'] , of course). 您将能够检索标签的ndarray作为result['labels'] （当然，还可以将值作为result['value'] ）。

Note that you could store several entries in the same label (provided you always have the same number), like in 请注意，您可以将多个条目存储在同一标签中（前提是您始终具有相同的编号），例如

result = np.array([(('a','b','c'), 1.23),(('a','c','d'), 2.34)],
                  dtype=[('label', ("|S10", 3)), ('value', float)])

where each individual record is given as a tuple, and the three 'labels' of a record as a tuple themselves. 每个单独的记录以元组形式给出，而记录的三个“标签”以元组本身形式给出。 You can also name each 'label' individually using a tailored dtype, for example: 您还可以使用定制的dtype分别命名每个“标签”，例如：

 dtype=[('label',[('A','|S10'),('B','|S10'),('C','|S10')]),('value',float)]

This way, you could access all the A through result['label']['A'] ... 这样，您可以通过result['label']['A']访问所有A ...

将稀疏格式转换为数组numpy

问题描述

1 个解决方案

解决方案1
0 2012-10-01 20:51:50

将稀疏格式转换为数组numpy

问题描述

1 个解决方案

解决方案1 0 2012-10-01 20:51:50

解决方案1
0 2012-10-01 20:51:50