简体   繁体   English

将稀疏格式转换为数组numpy

[英]convert sparse format to array numpy

Suppose you have a generator that returns the rows of a table. 假设您有一个返回表行的生成器。 Something you could use like this: 您可以这样使用:

for (labels, value) in rows:
    pass

"labels" is length n and say it's all strings for simplicity. “ labels”的长度为n,为了简单起见,它全部都是字符串。 "value" is something numeric like a float. “值”是一些数字,例如浮点数。

Is there a fast, best or built-in way to hash the labels and end up with an n-dimensional array of values and n lists telling you how to map the labels values to indices? 是否有一种快速,最佳或内置的方式对标签进行哈希处理并以n维值数组和n个列表结尾,从而告诉您如何将标签值映射到索引? I think you could maybe store this in a recarray? 我认为您可以将其存储在一个rearray中吗? I do this all the time but it always ends up being a bit of throw away code. 我一直在这样做,但最终总是会丢掉一些代码。 I'd like to find or create something more reusable. 我想找到或创建更多可重用的东西。

I would be very happy with (('here', 'there', 'nowhere'), 1.234) being mapped to either results['here']['there']['nowhere'] = 1.234 or results[12,3,45] = 1.234 (and having the corresponding lists given the labels down the axes in each dimension.) 如果将(('here', 'there', 'nowhere'), 1.234)映射到results['here']['there']['nowhere'] = 1.234results[12,3,45] = 1.234 (并且在每个维度的轴上都给相应的列表指定了标签)。

I could probably write generator conversion to sparse format and use scipy, but this seems like a nasty way to do something relatively simple. 我可能可以将生成器转换编写为稀疏格式并使用scipy,但这似乎是做相对简单的事情的一种讨厌的方式。

I see a lot of similar sounding questions but none which exactly answer this question. 我看到很多类似的听起来的问题,但没有一个能完全回答这个问题。 Maybe I'm missing a search phrase. 也许我缺少搜索短语。

You could try using a structured array: 您可以尝试使用结构化数组:

result = np.fromiter(your_generator, dtype=[('labels', '|S10'), ('value', float)])

You'll be able to retrieve a ndarray of labels as result['labels'] (and the values as result['value'] , of course). 您将能够检索标签的ndarray作为result['labels'] (当然,还可以将值作为result['value'] )。

Note that you could store several entries in the same label (provided you always have the same number), like in 请注意,您可以将多个条目存储在同一标签中(前提是您始终具有相同的编号),例如

result = np.array([(('a','b','c'), 1.23),(('a','c','d'), 2.34)],
                  dtype=[('label', ("|S10", 3)), ('value', float)])

where each individual record is given as a tuple, and the three 'labels' of a record as a tuple themselves. 每个单独的记录以元组形式给出,而记录的三个“标签”以元组本身形式给出。 You can also name each 'label' individually using a tailored dtype, for example: 您还可以使用定制的dtype分别命名每个“标签”,例如:

 dtype=[('label',[('A','|S10'),('B','|S10'),('C','|S10')]),('value',float)]

This way, you could access all the A through result['label']['A'] ... 这样,您可以通过result['label']['A']访问所有A ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM