简体   繁体   中英

Count of values in numpy.ndarray

Is there any way to do the following in purely numpy (or opencv)?

img = cv2.imread("test.jpg")
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1

The problem is that tuple(val) can obviously be one of 2^24 different values so having an array for every possible value is not possible since it'd be gigantic and mostly zeros, so I need a more efficient data structure.

The fastest way around this, if the image is stored in "chunky" format, ie the color planes dimension is the last, and this last dimension is contiguous, is to take a np.void view of every 24bits pixel, then run the result through np.unique and np.bincount :

>>> arr = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
>>> dt = np.dtype((np.void, arr.shape[-1]*arr.dtype.itemsize))
>>> if arr.strides[-1] != arr.dtype.itemsize:
...     arr = np.ascontiguousarray(arr)
... 
>>> arr_view = arr.view(dt)

The contents of arr_view look like garbage:

>>> arr_view [0, 0]
array([Â], 
      dtype='|V3')

But it's not us that have to understand the content:

>>> unq, _ = np.unique(arr_view, return_inverse=True)
>>> unq_cnts = np.bincount(_)
>>> unq = unq.view(arr.dtype).reshape(-1, arr.shape[-1])

And now you have the unique pixels and their counts in those two arrays:

>>> unq[:5]
array([[  0,  82,  78],
       [  6, 221, 188],
       [  9, 209,  85],
       [ 14, 210,  24],
       [ 14, 254,  88]], dtype=uint8)
>>> unq_cnts[:5]
array([1, 1, 1, 1, 1], dtype=int64)

Here is my solution:

  • convert the image to an one-dim array with dtype=uint32
  • sort() the array
  • use diff() to find all the position that color changed.
  • use diff() again to find the count of every color.

the code:

In [50]:
from collections import defaultdict
import cv2
import numpy as np
img = cv2.imread("test.jpg")

In [51]:
%%time
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1
Wall time: 1.29 s

In [53]:
%%time
img2 = np.concatenate((img, np.zeros_like(img[:, :, :1])), axis=2).view(np.uint32).ravel()
img2.sort()
pos = np.r_[0, np.where(np.diff(img2) != 0)[0] + 1]
count = np.r_[np.diff(pos), len(img2) - pos[-1]]
r, g, b, _ = img2[pos].view(np.uint8).reshape(-1, 4).T
colors = zip(r, g, b)
result = dict(zip(colors, count))
Wall time: 177 ms

In [49]:
counts == result
Out[49]:
True

If you can use pandas, you can call pandas.value_counts() , it's implemented in cython with hash table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM