简体   繁体   中英

Counting number of occurrences of an array in array of numpy 2D arrays

I have a numpy 2D array of arrays:

samples = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])

I need to count how many times an array is inside of the array occurs above like:

counts = [[1,2,3]:2, [2,3,4]:3, [4,5,6]:1]

I'm not sure how this can get counted or listed out the way I have above to know which array and counts are connected to each other, any help is appreciated. Thank you!

Everything you need is directly in numpy :

import numpy as np

a = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])

print(np.unique(a, axis=0, return_counts=True))

Result:

(array([[1, 2, 3],
       [2, 3, 4],
       [4, 5, 6]]), array([2, 3, 1], dtype=int64))

The result is a tuple of an array with the unique rows, and an array with the counts of those rows.

If you need to go through them pairwise:

unique_rows, counts = np.unique(a, axis=0, return_counts=True)

for row, c in zip(unique_rows, counts):
   print(row, c)

Result:

[1 2 3] 2
[2 3 4] 3
[4 5 6] 1

Here's a method of doing without using much of the numpy library:

import numpy as np
samples = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])

result = {}

for row in samples:
    inDictionary = False
    for check in range(len(result)):
        if np.all(result[str(check)][0] == row):
            result[str(check)][1]+= 1
            inDictionary = True
        else:
            pass
    if inDictionary == False:
        result[str(len(result))] = [row, 1]


print("------------------")
print(result)

This method creates a dictionary called result and then loops through the various nested lists in samples and checks if they are already in the dictionary. If they are the count of how many times it has appeared is increased by 1. Otherwise, it creates a new entry for that array. Now the counts and values that have been saved can be accessed using result["index"] for the index you want and result["index"][0] - for the array value & result["index"][1] - for the number of times it appeared.

There is a relatively fast method of Python in compare with other Python (no numpy ) solutions:

from collections import Counter
>>> Counter(map(tuple, samples.tolist())) # convert to dict if you need it
Counter({(1, 2, 3): 2, (2, 3, 4): 3, (4, 5, 6): 1})

Python does it quite fast too because operations of tuple indexing are optimised pretty good

import benchit
%matplotlib inline
benchit.setparams(rep=3)

sizes = [3, 10, 30, 100, 300, 900, 3000, 9000, 30000, 90000, 300000, 900000, 3000000]
arr = np.random.randint(0,10, size=(sizes[-1], 3)).astype(int)


def count_python(samples):
    return Counter(map(tuple, samples.tolist()))
    
def count_numpy(samples):
    return np.unique(samples, axis=0, return_counts=True)

fns = [count_python, count_numpy]
in_ = {s: (arr[:s],) for s in sizes}
t = benchit.timings(fns, in_, multivar=True, input_name='Number of items')
t.plot(logx=True, figsize=(12, 6), fontsize=14)

Note that arr.tolist() consumes about 0.8sec/3M of Python computing time.

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM