简体   繁体   English

如何随机获取 numpy 数组的一定数量的元素,每个 class 至少有一个元素?

[英]How do I randomly get a certain number of elements of a numpy array with at least one element from each class?

I have a dataset of 400 images, 10 images of 40 different people.我有一个包含 400 张图像的数据集,其中 10 张图像来自 40 个不同的人。 There are 2 NumPy arrays, "olivetti_faces" contains the images (400x64x64), and "olivetti_faces_target" contains the classes of those images (400), one class for each person.有 2 个 NumPy arrays,“olivetti_faces”包含图像(400x64x64),“olivetti_faces_target”包含这些图像的类别(400),每个人一个 class。 So "olivetti_faces" is of the form: array([<img1>, <img2>, ..., <img400>]) where <img> is a 64x64 array of numbers, and "olivetti_faces_target" is of the form: array([0, 0, ..., 39]) .所以“olivetti_faces”的形式是: array([<img1>, <img2>, ..., <img400>])其中<img>是一个 64x64 的数字数组,“olivetti_faces_target”的形式是: array([0, 0, ..., 39])

You can access the dataset here .您可以在此处访问数据集。 You can load them after downloading as follows:您可以在下载后加载它们,如下所示:

import numpy as np
data=np.load("olivetti_faces.npy")
target=np.load("olivetti_faces_target.npy")

I would like to randomly choose 100 of the images, with at least one image of each of the 40 people.我想随机选择 100 张图片,这 40 个人至少每人一张图片。 How can I achieve this in NumPy?我怎样才能在 NumPy 中实现这一点?

So far I could randomly get 100 images using the following code:到目前为止,我可以使用以下代码随机获取 100 张图像:

n = 100 # number of images to retrieve
rand_indeces = np.random.choice(data.shape[0], n, replace=False)
data_random = data[rand_indeces]
target_random = target_random[rand_indeces]

But it does not guarantee that at least one image of each of the 40 classes is included in data_random .但它不保证data_random中至少包含 40 个类别中每个类别的一张图像。

As suggested in my comment, first pick a random index from each class. Then choose random indexes from the entire array.正如我的评论中所建议的,首先从每个 class 中选择一个随机索引。然后从整个数组中选择随机索引。 This will guarantee that each class has an entry in the final result.这将保证每个 class 在最终结果中都有一个条目。

Since each class has 10 elements, you can loop through classes 0-39 and pick a value 0-9.由于每个 class 都有 10 个元素,您可以遍历 0-39 类并选择一个 0-9 的值。

Try this code:试试这个代码:

import numpy as np
import random

data=np.load("olivetti_faces.npy")
target=np.load("olivetti_faces_target.npy")

# target is groups of 10, so select random index in each block
for i in range(40):  # class 0-39
   rndindex.append(i*10 + random.randint(0,9)) # one per class
   
for i in range(60):  # up to 100
   idx = rndindex[0]
   while idx in rndindex:  # prevent duplicates
       idx = random.randint(0,399)  # other indexes can be anywhere
   rndindex.append(idx)

rand_indeces = []  # np array objects
for idx in rndindex:
   rand_indeces.append(data[idx])

print(rndindex)
#print(rand_indeces)

Output (note that the first 40 fall within blocks of 10) Output(注意前 40 个以 10 个为一组)

[9, 17, 23, 31, 41, 52, 60, 72, 83, 95, 
 100, 119, 121, 136, 140, 150, 166, 175, 188, 198, 
 209, 211, 221, 238, 243, 250, 261, 276, 289, 290, 
 306, 315, 325, 333, 344, 351, 368, 376, 382, 391, 
 62, 296, 327, 241, 393, 215, 64, 59, 185, 286, 
 162, 163, 364, 309, 220, 273, 32, 214, 217, 182, 
 172, 98, 19, 358, 92, 322, 68, 399, 226, 285, 
 103, 155, 249, 1, 75, 303, 311, 125, 339, 106, 
 127, 94, 101, 113, 35, 20, 189, 199, 128, 30, 
 131, 317, 337, 156, 340, 99, 397, 385, 384, 193]

One way to do this, as @xrisk suggested, would be randomly getting 100 images until the condition is satisfied, as follows:正如@xrisk 所建议的那样,一种方法是随机获取 100 张图像,直到满足条件为止,如下所示:

n = 100 # number of images to retrieve
n_unique = len(np.unique(target)) # number of classes
containsAllCats = False
while not containsAllCats:
    rand_indeces = np.random.choice(data.shape[0], n, replace=False)
    data_random = data[rand_indeces]
    target_random = target[rand_indeces]
    containsAllCats = len(np.unique(target_random)) == n_unique

However, it seems to be rather inefficient, most of the time requiring several iterations.然而,它似乎相当低效,大部分时间需要多次迭代。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从给定的元素列表生成随机的numpy数组,每个元素至少重复一次 - Generate random numpy array from a given list of elements with at least one repetition of each element Python 如何从组中获取元素的所有组合,每个组中最多有一个元素,并且其中一个组中至少有一个元素 - Python how to get all combinations of elements from groups with at most one element from each group and at least one element from one of the groups 我怎样才能找到 numpy 数组的所有元素,这些元素可以被其他数组的至少一个元素整除? - How can I find all elements of a numpy array which can be divided exactly by at least one element of other array? 我如何使用 plt.matshow() 从每行随机 select Numpy 数组的 10 个元素,以及 plot? - How would I randomly select 10 elements of a Numpy array from each row, and plot using plt.matshow()? 如果存在一定数量的元素,如何从 numpy 数组中删除行 - How to remove rows from numpy array if certain number of an element is present 如何使用numpy.repeat构建一个数组,其中每个元素比上一个元素大1%? - How do I build an array using numpy.repeat where each element is 1% over the previous one? 如何相对于自身缩放 numpy 数组的每个元素? - How do I scale each element of a numpy array relative to itself? 如何在numpy数组中获取每个数组中具有不同数量元素的唯一元素? - How to get unique elements in numpy array with different number of elements in each array? 如何生成一个新数组,其中每个元素都是 numpy 一维数组的局部范围内的最大数 - How do I generate a new array that each element is the the maximum number within a local range of a numpy 1d array 如何修改从 numpy 数组的每一列中选择 2 个数字的循环的 output - How do I modify the output of a loop that selects 2 number from each column of a numpy array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM