I wanted to create a dictionary using 2D ndarray for some millions of data.
Looking for a pythonic and performant way to achieve this
My ndarray:
format: [id, origin_lat, origin_lon, dest_lat,dest_lon, distance]
my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],
[246, 61.73,42.71,75.54,-81.69,16.32]])
Expected Output:
my_dict = {
245: {
'origin_lat_lon': {
'lat': 32.45,
'lon': 63.89
},
'dest_lat_lon': {
'lat': 72.1,
'lon': 63.57
},
'distance': 123.45
},
246: {
'origin_lat_lon': {
'lat': 61.73,
'lon': 42.71
},
'dest_lat_lon': {
'lat': 75.54,
'lon': -81.69
},
'distance': 16.32
}
}
my_list = [{'lat': 32.45, 'lon': 63.89},
{'lat': 72.1, 'lon': 63.57},
{'lat': 61.73, 'lon': 42.71},
{'lat': 75.54, 'lon': -81.69}]
My code:
my_dict = dict()
my_list = list()
for arr in my_array:
origin_lat_lon = {'lat': arr[1],
'lon': arr[2]}
dest_lat_lon = {'lat': arr[3],
'lon': arr[4]}
value = {'origin_lat_lon':origin_lat_lon,'dest_lat_lon':dest_lat_lon,'distance':arr[5]}
my_dict[int(arr[0])]=value
my_list.append(origin_lat_lon)
my_list.append(dest_lat_lon)
This is one approach using dict
with zip
and slicing
.
Ex:
import numpy as np
my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],[246, 61.73,42.71,75.54,-81.69,16.32]])
keys = ['origin_lat', 'origin_lon', 'dest_lat','dest_lon', 'distance']
keys_2 = ['lat', 'lon']
my_dict = {}
my_list = []
for arr in my_array:
key, vals = arr[0], arr[1:]
my_dict[int(key)] = dict(zip(keys, vals))
my_list.extend([[dict(zip(keys_2, vals[0:2]))],[dict(zip(keys_2, vals[2:4]))]])
print(my_dict)
print(my_list)
Output:
{245: {'dest_lat': 72.1,
'dest_lon': 63.57,
'distance': 123.45,
'origin_lat': 32.45,
'origin_lon': 63.89},
246: {'dest_lat': 75.54,
'dest_lon': -81.69,
'distance': 16.32,
'origin_lat': 61.73,
'origin_lon': 42.71}}
[[{'lat': 32.45, 'lon': 63.89}],
[{'lat': 72.1, 'lon': 63.57}],
[{'lat': 61.73, 'lon': 42.71}],
[{'lat': 75.54, 'lon': -81.69}]]
Your code wrapped in a function, times:
In [220]: timeit foo(my_array)
7.14 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
converting the array to a list cuts time in half. tolist()
is a (relatively) fast method for converting an array to a nested list. Iterating on a list is faster than iterating on an array:
In [221]: timeit foo(my_array.tolist())
2.68 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Rakesh's version, is somewhat slower (I haven't identified why):
In [222]: timeit rakesh(my_array)
18.5 µs ± 63.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [223]: timeit rakesh(my_array.tolist())
9.49 µs ± 26.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Chris's pandas version is quite a bit slower. pandas
does have a nice interface to/from dictionaries, but apparently it isn't fast. It probably is pure Python, and looses speed by being general purpose.
In [224]: timeit foo_pd(my_array)
3.35 ms ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Python dictionaries are efficient for what they do, but they still have to be accessed key by key. numpy
does not have of its own compiled code for working with dictionaries.
===
Your array could be cast as a structured array. With that columns are replaced fields, which are accessed by name. So it's more dictionary-like, though probably not any better for creating a json
output. (And it's not a speed tool)
In [225]: dt = np.dtype([('id',int),('origin_lat',float),('origin_lon',float),('
...: dest_lat',float),('dest_lon',float),('distance',float)])
In [226]: import numpy.lib.recfunctions as rf
In [228]: sarr =rf.unstructured_to_structured(my_array, dt)
In [229]: sarr
Out[229]:
array([(245, 32.45, 63.89, 72.1 , 63.57, 123.45),
(246, 61.73, 42.71, 75.54, -81.69, 16.32)],
dtype=[('id', '<i8'), ('origin_lat', '<f8'), ('origin_lon', '<f8'), ('dest_lat', '<f8'), ('dest_lon', '<f8'), ('distance', '<f8')])
In [230]: sarr['dest_lon']
Out[230]: array([ 63.57, -81.69])
In [236]: timeit sarr =rf.unstructured_to_structured(my_array, dt)
46.3 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.