简体   繁体   中英

Is there a way to find the counts of unique binary arrays in a list of arrays?

If I had a numpy array in the form of

[[0. 1. 1. 1. 1.],
[1. 0. 0. 0. 0.], 
[1. 0. 0. 0. 0.],
[1. 0. 0. 0. 0.],
[1. 0. 0. 0. 0.],
[0. 1. 1. 1. 1.]]

is there a way to determine the frequency of these binary arrays?

Using the example listed above, the frequencies would be something like [1.0.0.0.0] - 4, [0.1.1.1.1] - 2. I've tried using np.unique, but that returns the counts of just unique numbers which isn't super helpful in this case.

from collections import Counter

counts = Counter(map(tuple, arr))

map(tuple, arr) converts each row of the array to a tuple which is hashable and thus can be stored in a mapping like Counter.

Using only numpy .

import numpy as np

b = np.array([[0, 1, 1, 1, 1,],
            [1, 0, 0, 0, 0,], 
            [1, 0, 0, 0, 0,],
            [1, 0, 0, 0, 0,],
            [1, 0, 0, 0, 0,],
            [0, 1, 1, 1, 1,]])

c = np.unique(b, axis=0, return_counts=True)
print(c)

returns:

(array([[0, 1, 1, 1, 1],
   [1, 0, 0, 0, 0]]), array([2, 4], dtype=int64))

其他人已经给了你答案,但我只是想指出,如果你像其他人建议的那样将内部变成元组之类的东西, np.unique 确实返回计数为第四个返回参数

If your array has no more than 64 columns, then you can convert the rows to numbers and then count with np.unique :

import numpy as np

data = np.array([[0., 1., 1., 1., 1.],
                 [1., 0., 0., 0., 0.], 
                 [1., 0., 0., 0., 0.],
                 [1., 0., 0., 0., 0.],
                 [1., 0., 0., 0., 0.],
                 [0., 1., 1., 1., 1.]])
# Convert each row into an integer
b = 1 << np.arange(data.shape[1], dtype=np.uint64)
nums = (b * data.astype(np.uint64)).sum(1)
# Count occurrences
vals, counts = np.unique(nums, return_counts=True)
# Make result
result = {tuple(((v & b) != 0).astype(np.uint8)): c for v, c in zip(vals, counts)}
print(result)
# {(1, 0, 0, 0, 0): 4, (0, 1, 1, 1, 1): 2}

Could you try:

arrays = [[0. 1. 1. 1. 1.],
[1. 0. 0. 0. 0.], 
[1. 0. 0. 0. 0.],
[1. 0. 0. 0. 0.],
[1. 0. 0. 0. 0.],
[0. 1. 1. 1. 1.]]
print(len(set([tuple(i) for i in arrays])))

This removes duplicates and then measures the length of the result.

Looks like you don't have experience with text mining. To expand your horizons, how about thinking of each row in the matrix as a string, ie, a word.

When working with words, you can use a hashtable (dictionary) to count the numbers of times any word has been used in a list. Dictionaries store what are called Key-Value pairs. The first time a word is seen, it becomes a unique Key . Thereafter, if you feed a dictionary a word that's already been seen, it will be detected as already being set to a Key , and if so, you simply pad the Value for that Key by one.

If you want to generate unique combinations of binary numbers (0,1), look at the "revolving door algorithm" by Donald Knuth.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM