Background: I'm trying to build affinity matrix to feed into sklearn spectral clustering.
In this problem, I encounter the problem where numpy array indexes are 0-based integer, and for my application I'm using some sort of application specific ID (string-based, a random example "abc123"). I would like to create a 2d numpy array indexed by all the data points I have. For instance, given two points points = ["abc123", "xyz456"]
, I would have 2d numpy array whose row indices and column indexes are points
. So that I could easily specify the distance between two points by something similar to arr["abc123"]["xyz456"] = dist
How could I achieve that? Thank you.
Pandas can do this and much much more...
In [41]: import pandas as pd
In [122]: a = np.random.randint(100, size=(5, 3))
In [123]: a
Out[123]:
array([[53, 7, 34],
[54, 56, 85],
[ 0, 11, 83],
[63, 28, 88],
[65, 19, 44]])
In [124]: df = pd.DataFrame(a, index=list('abcde'), columns=list('xyz'))
In [125]: df
Out[125]:
x y z
a 53 7 34
b 54 56 85
c 0 11 83
d 63 28 88
e 65 19 44
In [126]: df.loc[['a','d'], ['x','y']]
Out[126]:
x y
a 53 7
d 63 28
we can always get a Numpy array from the DataFrame using .values
accessor:
In [127]: df.values
Out[127]:
array([[53, 7, 34],
[54, 56, 85],
[ 0, 11, 83],
[63, 28, 88],
[65, 19, 44]])
In [128]: df.loc[['a','d'], ['x','y']].values
Out[128]:
array([[53, 7],
[63, 28]])
You can use dictionary with keys but if you still require numpy array you can play with dtype
. From the doc :
>>> dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])
>>> x = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)
>>> x[1]
('John', [6.0, 7.0])
>>> x[1]['grades']
array([ 6., 7.])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.