简体   繁体   中英

Number of values greater than a threshold

I have a matrix A . Now I want to find the number of elements greater than 5 and their corresponding indices. How to solve this in matlab without using for loop?

For example if A = [1 4 6 8 9 5 6 8 9]' :

  • Number of elements > 5: 6
  • Indices: [3 4 5 7 8 9]

You use find :

index = find(A>5);
numberOfElements = length(index);

You use sum , which allows you to get the number of elements with one command:

numberOfElements = sum(A>5);

Do you really need explicit indices? Because the logical matrix A>5 can also be used as index (usually a tad more efficient than indexing with find ):

index = (A>5);
numberOfElements = sum(index);

For completeness: indexing with logicals is the same as with regular indices:

>> A(A>5)
ans = 
     6  8  9  6  8  9

Motivated by the above discussion with Rody, here is a simple benchmark, which tests speed of integer vs. logical array indexing in MATLAB. Quite an important thing I would say, since 'vectorized' MATLAB is mostly about indexing. So

% random data
a = rand(10^7, 1);

% threashold - how much data meets the a>threashold criterion
% This determines the total indexing time - the more data we extract from a,
% the longer it takes.
% In this example - small threashold meaning most data in a 
% will meet the criterion.
threashold = 0.08;

% prepare logical and integer indices (note the uint32 cast)
index_logical = a>threashold;
index_integer = uint32(find(index_logical));

% logical indexing of a
tic
for i=1:10
    b = a(index_logical);
end
toc

% integer indexing of a
tic
for i=1:10
    b = a(index_integer);
end
toc

On my computer the results are

Elapsed time is 0.755399 seconds.
Elapsed time is 0.728462 seconds.

meaning that the two methods perform almost the same - thats how I chose the example threashold . It is interesing, because the index_integer array is almost 4 times larger!

index_integer       9198678x1              36794712  uint32               
index_logical      10000000x1              10000000  logical              

For larger values of the threashold integer indexing is faster. Results for threashold=0.5 :

Elapsed time is 0.687044 seconds. (logical)
Elapsed time is 0.296044 seconds. (integer)

Unless I am doing something wrong here, integer indexing seems to be the fastest most of the time.

Including the creation of the indices in the test yields very different results however:

a = rand(1e7, 1);    
threshold = 0.5;

% logical 
tic
for i=1:10
    inds = a>threshold;
    b = a(inds);
end
toc

% double
tic
for i=1:10
    inds = find(a>threshold);
    b = a(inds);
end
toc

% integer 
tic
for i=1:10
    inds = uint32(find(a>threshold));
    b = a(inds);
end
toc

Results (Rody):

Elapsed time is 1.945478 seconds. (logical)
Elapsed time is 3.233831 seconds. (double)
Elapsed time is 3.508009 seconds. (integer)

Results (angainor):

Elapsed time is 1.440018 seconds. (logical)
Elapsed time is 1.851225 seconds. (double)
Elapsed time is 1.726806 seconds. (integer)

So it would seem that the actual indexing is faster when indexing with integers, but front-to-back, logical indexing performs much better.

The runtime difference between the last two methods is unexpected though -- it seems Matlab's internals either do not cast the doubles to integers, of perform error-checking on each element before doing the actual indexing. Otherwise, we would have seen virtually no difference between the double and integer methods.

Edit There are two options as I see it:

  • matlab converts double indices to uint32 indices explicitly before the indexing call (much like we do in the integer test)
  • matlab passes doubles and performs the double->int cast on the fly during the indexing call

The second option should be faster, because we only have to read the double indexes once. In our explicit conversion test we have to read double indices, write integer indices, and then again read the integer indices during the actual indexing. So matlab should be faster... Why is it not?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM