I have a numpy array and a coo
matrix. I need to update the numpy array based on elements in the coo
matrix. Both the numpy array and the matrix are very large, here is what they look like:
graph_array = [[ 1.0 1.0 5.0 9.0]
[ 2.0 5.0 6.0 5.0]
[ 3.0 5.0 7.0 6.0]]
matrix_coo = (1, 5) 0.5
(2, 8) 0.4
(5, 7) 0.8
What I need to do is as follows:
If the second and third element in each list within the array ie list_graph[i][1][2]
(which could be 1,5
, 5,6
or 5,7
) is equal to a row and column pair in the coo
matrix such as (1, 5), (2, 8) or (5, 7)
then the value associated with that pair (for (1, 5)
this equals 0.5
) must replace the fourth element in the list within the array.
My expected output would thus be:
output_array = [[ 1.0 1.0 5.0 0.5]
[ 2.0 5.0 6.0 5.0]
[ 3.0 5.0 7.0 0.8]]
The current code I am using is as follows:
row_idx = list(matrix_coo.row)
col_idx = list(matrix_coo.col)
data_idx = list(matrix_coo.data)
x = 0
while x < len(row_cost_idx):
for m in graph_array:
if m[1] == row_idx[x]:
if m[2] == col_idx[x]:
m[3] = data_idx[x]
x += 1
It does give me the correct output but because the array has 21596 items and the matrix has 21596 rows it takes a very long time.
Is there a faster way of doing this?
Your iteration is a pure Python list operation. The fact that row_idx
originated as an attribute of a coo_matrix
doesn't apply
It could be cleaned up a bit with:
What is row_cost_idx
? If it is the same as row_idx
we could do
for r,c,d in zip(matrix_coo.row, matrix_coo.col, matrix_coo.data):
for m in graph_array: # not list_graph?
if m[:2]==[r,c]:
m[3] = d
I think the iteration is the same, but haven't tested it. I'm not sure about speed either.
The double iteration, over nonzero elements of matrix_coo
and sublists of graph_array
is bound to be slow, simply because you are doing very many iterations.
If graph_array
was a numpy
array
, we can test all rows at once, with something like
mask = (graph_array[:, :2]==[r,c]).all(axis=1)
graph_array[mask,3] = d
where mask
would have 1's for the rows of graph_array
that have the right indexes. (again this isn't tested)
To get more speed I'd cast both graph_array
and matrix_coo
as 2d numpy (dense) arrays, and see if I can solve the problem with a few array operations. Insights from that might help me replace the matrix_coo
iteration.
=========================
Tested code
import numpy as np
from scipy import sparse
graph_array = np.array([[ 1.0, 1.0, 5.0 , 9.0],
[ 2.0, 5.0 , 6.0 , 5.0],
[ 3.0 , 5.0 , 7.0 , 6.0]])
r,c,d = [1,2,5], [5,8,7],[0.5,0.4,0.8]
matrix_coo = sparse.coo_matrix((d,(r,c)))
def org(graph_array, matrix_coo):
row_idx = list(matrix_coo.row)
col_idx = list(matrix_coo.col)
data_idx = list(matrix_coo.data)
x = 0
while x < len(row_idx):
for m in graph_array:
if m[1] == row_idx[x]:
if m[2] == col_idx[x]:
m[3] = data_idx[x]
x += 1
return graph_array
new_array = org(graph_array.copy(), matrix_coo)
print(graph_array)
print(new_array)
def alt(graph_array, matrix_coo):
for r,c,d in zip(matrix_coo.row, matrix_coo.col, matrix_coo.data):
for m in graph_array:
if (m[[1,2]]==[r,c]).all(): # array test
m[3] = d
return graph_array
new_array = alt(graph_array.copy(), matrix_coo)
print(new_array)
def altlist(graph_array, matrix_coo):
for r,c,d in zip(matrix_coo.row, matrix_coo.col, matrix_coo.data):
for m in graph_array:
if (m[1:3]==[r,c]): # list test
m[3] = d
return graph_array
new_array = altlist(graph_array.tolist(), matrix_coo)
print(new_array)
def altarr(graph_array, matrix_coo):
for r,c,d in zip(matrix_coo.row, matrix_coo.col, matrix_coo.data):
mask = (graph_array[:, 1:3]==[r,c]).all(axis=1)
graph_array[mask,3] = d
return graph_array
new_array = alt(graph_array.copy(), matrix_coo)
print(new_array)
run
0909:~/mypy$ python3 stack3727173.py
[[ 1. 1. 5. 9.]
[ 2. 5. 6. 5.]
[ 3. 5. 7. 6.]]
[[ 1. 1. 5. 0.5]
[ 2. 5. 6. 5. ]
[ 3. 5. 7. 0.8]]
[[ 1. 1. 5. 0.5]
[ 2. 5. 6. 5. ]
[ 3. 5. 7. 0.8]]
[[1.0, 1.0, 5.0, 0.5], [2.0, 5.0, 6.0, 5.0], [3.0, 5.0, 7.0, 0.80000000000000004]]
[[ 1. 1. 5. 0.5]
[ 2. 5. 6. 5. ]
[ 3. 5. 7. 0.8]]
For this small example, your function is fastest. It also works with both list and array. For small stuff list operations are often faster than array ones. So using array operations to just compare 2 numbers is not an improvement.
replicating graph_array
1000x the altarr
version is 10x faster than your code. It's performing array operations on the largest dimension. I haven't tried to increase the size of matrix_coo
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.