I have 2 numpy matrices A
and B
:
A
matrix has as possible values only 1 or 0 (ON or OFF). B
matrix has integers (min value -1). I need to plot a histogram between the elements of matrix B(X-axis)
and their frequency they are listed as ON in matrix A
(in the corresponding positions).
For example:
IF A[1][1] and A[2][2] are 1,
AND B[1][1] and B[2][2] are 2,
THEN frequency of 2 should be 2 (similarly for each element of matrix B).
Basically for each element in B
, its frequency increases by 1 if the corresponding element in A
is 1
.
The matrices I am handling are huge (3992x3992). How do I do this as efficiently as possible?
If the values in B
where all small positive integers, you could simply do:
count = np.bincount(B.ravel())
tally = np.bincount(B.ravel(), weights=A.ravel())
freq = tally / count
But because you have negative numbers, it is probably best to play it safe and run B
through np.unique
first:
unq_val, unq_idx = np.unique(B.ravel(), return_inverse=True)
unq_count = np.bincount(unq_idx)
unq_tally = np.bincount(unq_idx, weights=A.ravel())
unq_freq = unq_tally / unq_count
When numpy 1.9 hits the street in the next few weeks, you can get an extra performance edge by joining the first two lines in the single one:
unq_val, unq_idx, unq_count = np.unique(B.ravel(), return_inverse=True,
return_counts=True)
After that, you will have you x
values in unq_val
and the corresponding y
values in unq_freq
. On my system, with this made up data:
A = np.random.randint(2, size=(3992, 3992))
B = np.random.randint(50, size=(3992, 3992))
The whole thing runs in 0.3 sec without passing it through unique, and in a little over 6 sec when using it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.