简体   繁体   中英

Plotting Histogram using data from 2 numpy matrices

I have 2 numpy matrices A and B :

  • A matrix has as possible values only 1 or 0 (ON or OFF).
  • B matrix has integers (min value -1).

I need to plot a histogram between the elements of matrix B(X-axis) and their frequency they are listed as ON in matrix A (in the corresponding positions).

For example:

IF A[1][1] and A[2][2] are 1, 
AND B[1][1] and B[2][2] are 2, 
THEN frequency of 2 should be 2 (similarly for each element of matrix B).

Basically for each element in B , its frequency increases by 1 if the corresponding element in A is 1 .

The matrices I am handling are huge (3992x3992). How do I do this as efficiently as possible?

If the values in B where all small positive integers, you could simply do:

count = np.bincount(B.ravel())
tally = np.bincount(B.ravel(), weights=A.ravel())
freq = tally / count

But because you have negative numbers, it is probably best to play it safe and run B through np.unique first:

unq_val, unq_idx = np.unique(B.ravel(), return_inverse=True)
unq_count = np.bincount(unq_idx)
unq_tally = np.bincount(unq_idx, weights=A.ravel())
unq_freq = unq_tally / unq_count

When numpy 1.9 hits the street in the next few weeks, you can get an extra performance edge by joining the first two lines in the single one:

unq_val, unq_idx, unq_count = np.unique(B.ravel(), return_inverse=True,
                                        return_counts=True)

After that, you will have you x values in unq_val and the corresponding y values in unq_freq . On my system, with this made up data:

A = np.random.randint(2, size=(3992, 3992))
B = np.random.randint(50, size=(3992, 3992))

The whole thing runs in 0.3 sec without passing it through unique, and in a little over 6 sec when using it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM