简体   繁体   English

具有非整数索引的Python Numpy 2d数组

[英]Python Numpy 2d array with non-integer index

Background: I'm trying to build affinity matrix to feed into sklearn spectral clustering. 背景:我正在尝试建立亲和度矩阵以馈入sklearn光谱聚类。

In this problem, I encounter the problem where numpy array indexes are 0-based integer, and for my application I'm using some sort of application specific ID (string-based, a random example "abc123"). 在此问题中,我遇到了以下问题:numpy数组索引是基于0的整数,而对于我的应用程序,我正在使用某种特定于应用程序的ID(基于字符串,随机示例为“ abc123”)。 I would like to create a 2d numpy array indexed by all the data points I have. 我想创建一个由我拥有的所有数据点索引的2d numpy数组。 For instance, given two points points = ["abc123", "xyz456"] , I would have 2d numpy array whose row indices and column indexes are points . 例如,给定两个点points = ["abc123", "xyz456"] ,我将拥有2d numpy数组,其行索引和列索引为points So that I could easily specify the distance between two points by something similar to arr["abc123"]["xyz456"] = dist 这样我可以通过类似于arr["abc123"]["xyz456"] = dist点轻松指定两点之间的距离

How could I achieve that? 我该如何实现? Thank you. 谢谢。

Pandas can do this and much much more... 熊猫可以做到这一点,还有更多……

In [41]: import pandas as pd

In [122]: a = np.random.randint(100, size=(5, 3))

In [123]: a
Out[123]:
array([[53,  7, 34],
       [54, 56, 85],
       [ 0, 11, 83],
       [63, 28, 88],
       [65, 19, 44]])

In [124]: df = pd.DataFrame(a, index=list('abcde'), columns=list('xyz'))

In [125]: df
Out[125]:
    x   y   z
a  53   7  34
b  54  56  85
c   0  11  83
d  63  28  88
e  65  19  44

In [126]: df.loc[['a','d'], ['x','y']]
Out[126]:
    x   y
a  53   7
d  63  28

we can always get a Numpy array from the DataFrame using .values accessor: 我们总是可以使用.values访问器从DataFrame中获得一个Numpy数组:

In [127]: df.values
Out[127]:
array([[53,  7, 34],
       [54, 56, 85],
       [ 0, 11, 83],
       [63, 28, 88],
       [65, 19, 44]])

In [128]: df.loc[['a','d'], ['x','y']].values
Out[128]:
array([[53,  7],
       [63, 28]])

You can use dictionary with keys but if you still require numpy array you can play with dtype . 您可以将字典与键配合使用,但是如果仍然需要numpy数组,则可以使用dtype进行播放。 From the doc : 文档

>>> dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])
>>> x = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)
>>> x[1]
('John', [6.0, 7.0])
>>> x[1]['grades']
array([ 6.,  7.])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM