[英]How do I randomly get a certain number of elements of a numpy array with at least one element from each class?
I have a dataset of 400 images, 10 images of 40 different people.我有一个包含 400 张图像的数据集,其中 10 张图像来自 40 个不同的人。 There are 2 NumPy arrays, "olivetti_faces" contains the images (400x64x64), and "olivetti_faces_target" contains the classes of those images (400), one class for each person.
有 2 个 NumPy arrays,“olivetti_faces”包含图像(400x64x64),“olivetti_faces_target”包含这些图像的类别(400),每个人一个 class。 So "olivetti_faces" is of the form:
array([<img1>, <img2>, ..., <img400>])
where <img>
is a 64x64 array of numbers, and "olivetti_faces_target" is of the form: array([0, 0, ..., 39])
.所以“olivetti_faces”的形式是:
array([<img1>, <img2>, ..., <img400>])
其中<img>
是一个 64x64 的数字数组,“olivetti_faces_target”的形式是: array([0, 0, ..., 39])
。
You can access the dataset here .您可以在此处访问数据集。 You can load them after downloading as follows:
您可以在下载后加载它们,如下所示:
import numpy as np
data=np.load("olivetti_faces.npy")
target=np.load("olivetti_faces_target.npy")
I would like to randomly choose 100 of the images, with at least one image of each of the 40 people.我想随机选择 100 张图片,这 40 个人至少每人一张图片。 How can I achieve this in NumPy?
我怎样才能在 NumPy 中实现这一点?
So far I could randomly get 100 images using the following code:到目前为止,我可以使用以下代码随机获取 100 张图像:
n = 100 # number of images to retrieve
rand_indeces = np.random.choice(data.shape[0], n, replace=False)
data_random = data[rand_indeces]
target_random = target_random[rand_indeces]
But it does not guarantee that at least one image of each of the 40 classes is included in data_random
.但它不保证
data_random
中至少包含 40 个类别中每个类别的一张图像。
As suggested in my comment, first pick a random index from each class. Then choose random indexes from the entire array.正如我的评论中所建议的,首先从每个 class 中选择一个随机索引。然后从整个数组中选择随机索引。 This will guarantee that each class has an entry in the final result.
这将保证每个 class 在最终结果中都有一个条目。
Since each class has 10 elements, you can loop through classes 0-39 and pick a value 0-9.由于每个 class 都有 10 个元素,您可以遍历 0-39 类并选择一个 0-9 的值。
Try this code:试试这个代码:
import numpy as np
import random
data=np.load("olivetti_faces.npy")
target=np.load("olivetti_faces_target.npy")
# target is groups of 10, so select random index in each block
for i in range(40): # class 0-39
rndindex.append(i*10 + random.randint(0,9)) # one per class
for i in range(60): # up to 100
idx = rndindex[0]
while idx in rndindex: # prevent duplicates
idx = random.randint(0,399) # other indexes can be anywhere
rndindex.append(idx)
rand_indeces = [] # np array objects
for idx in rndindex:
rand_indeces.append(data[idx])
print(rndindex)
#print(rand_indeces)
Output (note that the first 40 fall within blocks of 10) Output(注意前 40 个以 10 个为一组)
[9, 17, 23, 31, 41, 52, 60, 72, 83, 95,
100, 119, 121, 136, 140, 150, 166, 175, 188, 198,
209, 211, 221, 238, 243, 250, 261, 276, 289, 290,
306, 315, 325, 333, 344, 351, 368, 376, 382, 391,
62, 296, 327, 241, 393, 215, 64, 59, 185, 286,
162, 163, 364, 309, 220, 273, 32, 214, 217, 182,
172, 98, 19, 358, 92, 322, 68, 399, 226, 285,
103, 155, 249, 1, 75, 303, 311, 125, 339, 106,
127, 94, 101, 113, 35, 20, 189, 199, 128, 30,
131, 317, 337, 156, 340, 99, 397, 385, 384, 193]
One way to do this, as @xrisk suggested, would be randomly getting 100 images until the condition is satisfied, as follows:正如@xrisk 所建议的那样,一种方法是随机获取 100 张图像,直到满足条件为止,如下所示:
n = 100 # number of images to retrieve
n_unique = len(np.unique(target)) # number of classes
containsAllCats = False
while not containsAllCats:
rand_indeces = np.random.choice(data.shape[0], n, replace=False)
data_random = data[rand_indeces]
target_random = target[rand_indeces]
containsAllCats = len(np.unique(target_random)) == n_unique
However, it seems to be rather inefficient, most of the time requiring several iterations.然而,它似乎相当低效,大部分时间需要多次迭代。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.