[英]Python - Sample one element randomly from a list based on the unique elements of another list
I have 2 lists containing user_ids
and item_ids
.我有 2 个包含
user_ids
和item_ids
的列表。 I want to sample one item for each user randomly.我想为每个用户随机抽取一个项目。
For Ex.对于前。
user_ids = [1,2,3 ,1, 2]
item ids = [8,9,10,5,8]
I want to get -我想得到 -
val_user_ids = [1,2,3]
val_item_ids = [5,9,10]
I know some inefficient ways like looping etc. Is there any efficient way to do so?我知道一些低效的方法,比如循环等。有没有有效的方法呢? Or is there exist any python function for the same?
或者是否存在相同的 python function?
To be precise, I want to create a validation set (from the training set) containing 1 item interaction for each user.准确地说,我想为每个用户创建一个包含 1 个项目交互的验证集(来自训练集)。
You can gather your data in a dictionary with the user_id as the key and the item_ids in a list as the value您可以在字典中收集数据,其中 user_id 作为键,列表中的 item_ids 作为值
import collections
user_ids = [1, 2, 3, 1, 2]
item_ids = [8, 9, 10, 5, 8]
data = collections.defaultdict(list)
for key, value in zip(user_ids, item_ids):
data[key].append(value)
The result is defaultdict(<class 'list'>, {1: [8, 5], 2: [9, 8], 3: [10]})
.结果是
defaultdict(<class 'list'>, {1: [8, 5], 2: [9, 8], 3: [10]})
。
Now we can loop over the dictionary and get a random item from the list.现在我们可以遍历字典并从列表中获取一个随机项。
import random
result = [(key, random.choice(value)) for key, value in data.items()]
The result is [(1, 8), (2, 9), (3, 10)]
(or [(1, 8), (2, 8), (3, 10)]
or whatever the randomization will give us).结果是
[(1, 8), (2, 9), (3, 10)]
(或[(1, 8), (2, 8), (3, 10)]
或随机化将给我们的任何内容)。
Some more information concerning the defaultdict
.有关
defaultdict
的更多信息。 This kind of dictionary will create a default item if it doesn't exist.如果它不存在,这种字典将创建一个默认项。 The default is given as a parameter when creating the
defaultdict
.创建
defaultdict
时,默认值作为参数给出。 Using a standard dict
we have to handle the creation of the entry ourselves.使用标准
dict
,我们必须自己处理条目的创建。
This is how it would be done manually:这是手动完成的方式:
user_ids = [1, 2, 3, 1, 2]
item_ids = [8, 9, 10, 5, 8]
data = dict()
for key, value in zip(user_ids, item_ids):
if key not in data:
data[key] = []
data[key].append(value)
Could you use numpy?你能用 numpy 吗? an example code would be:
一个示例代码是:
import numpy as np
idx = list(range(your_list_size))
# make random draw based your validation size
val_size = 0.2
val_n = int(your_list_size*val_size)
# draw sample from user and item list, replace=False means no replacement
chosen_idx = np.random.choice(idx, size=val_n, replace=False)
# get actual values by chosen idx
sample_users = np.array(user_ids)[chosen_idx]
sample_items = np.array(item_ids)[chosen_idx]
or even simply do the followings:甚至简单地执行以下操作:
sample_users = np.random.choice(user_ids, size=val_n, replace=False)
sample_items = np.random.choice(items_ids, size=val_n, replace=False)
Assuming the items need to be sampled with replacement, the following code will work:假设需要对物品进行更换抽样,以下代码将起作用:
import random
user_ids = [1,2,3,1,2]
item_ids = [8,9,10,5,8]
val_user_ids = sorted(set(user_ids))
val_item_ids = [random.choice(item_ids) for item in val_user_ids]
The set built-in returns a set (unique items) from an iterable like a list, and then the sorted built-in function returns a sorted list (if you don't need to sort, just use list(set(user_ids))
). set 内置从一个可迭代的列表中返回一个集合(唯一项),然后 sorted 内置 function 返回一个排序列表(如果不需要排序,只需使用
list(set(user_ids))
)。 The list comprehension then creates (usually more efficiently than a for loop in terms of execution speed) a new list with the items sampled from item_ids, with replacement.然后列表推导式创建(在执行速度方面通常比 for 循环更有效)一个新列表,其中包含从 item_ids 采样的项目,并进行替换。 One caveat: the user_id list needs to contain immutable items for this code to work (numbers are fine, so are strings, frozensets, and tuples as long as the tuple does not contain mutable structures like lists).
一个警告:user_id 列表需要包含不可变项才能使此代码正常工作(数字很好,字符串、frozensets 和元组也可以,只要元组不包含列表等可变结构)。
If instead you need to sample without replacement, you could use:如果您需要在不更换的情况下进行采样,您可以使用:
import random
user_ids = [1,2,3 ,1, 2]
item_ids = [8,9,10,5,8]
val_user_ids = sorted(set(user_ids))
random.shuffle(item_ids)
val_item_ids = [item_ids.pop(i) for i in range(len(val_user_ids))]
The same caveat about sets applied (can't contain anything mutable).关于应用集的相同警告(不能包含任何可变的内容)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.