简体   繁体   English

Python - 根据另一个列表的唯一元素从一个列表中随机采样一个元素

[英]Python - Sample one element randomly from a list based on the unique elements of another list

I have 2 lists containing user_ids and item_ids .我有 2 个包含user_idsitem_ids的列表。 I want to sample one item for each user randomly.我想为每个用户随机抽取一个项目。

For Ex.对于前。

user_ids = [1,2,3 ,1, 2]
item ids = [8,9,10,5,8]

I want to get -我想得到 -

val_user_ids  = [1,2,3]
val_item_ids = [5,9,10]

I know some inefficient ways like looping etc. Is there any efficient way to do so?我知道一些低效的方法,比如循环等。有没有有效的方法呢? Or is there exist any python function for the same?或者是否存在相同的 python function?

To be precise, I want to create a validation set (from the training set) containing 1 item interaction for each user.准确地说,我想为每个用户创建一个包含 1 个项目交互的验证集(来自训练集)。

You can gather your data in a dictionary with the user_id as the key and the item_ids in a list as the value您可以在字典中收集数据,其中 user_id 作为键,列表中的 item_ids 作为值

import collections

user_ids = [1, 2, 3, 1, 2]
item_ids = [8, 9, 10, 5, 8]

data = collections.defaultdict(list)
for key, value in zip(user_ids, item_ids):
    data[key].append(value)

The result is defaultdict(<class 'list'>, {1: [8, 5], 2: [9, 8], 3: [10]}) .结果是defaultdict(<class 'list'>, {1: [8, 5], 2: [9, 8], 3: [10]})

Now we can loop over the dictionary and get a random item from the list.现在我们可以遍历字典并从列表中获取一个随机项。

import random
result = [(key, random.choice(value)) for key, value in data.items()]

The result is [(1, 8), (2, 9), (3, 10)] (or [(1, 8), (2, 8), (3, 10)] or whatever the randomization will give us).结果是[(1, 8), (2, 9), (3, 10)] (或[(1, 8), (2, 8), (3, 10)]或随机化将给我们的任何内容)。


Some more information concerning the defaultdict .有关defaultdict的更多信息。 This kind of dictionary will create a default item if it doesn't exist.如果它不存在,这种字典将创建一个默认项。 The default is given as a parameter when creating the defaultdict .创建defaultdict时,默认值作为参数给出。 Using a standard dict we have to handle the creation of the entry ourselves.使用标准dict ,我们必须自己处理条目的创建。

This is how it would be done manually:这是手动完成的方式:

user_ids = [1, 2, 3, 1, 2]
item_ids = [8, 9, 10, 5, 8]

data = dict()
for key, value in zip(user_ids, item_ids):
    if key not in data:
        data[key] = []
    data[key].append(value)

Could you use numpy?你能用 numpy 吗? an example code would be:一个示例代码是:

import numpy as np 

idx = list(range(your_list_size))

# make random draw based your validation size 
val_size = 0.2
val_n = int(your_list_size*val_size)

# draw sample from user and item list, replace=False means no replacement
chosen_idx = np.random.choice(idx, size=val_n, replace=False)

# get actual values by chosen idx
sample_users = np.array(user_ids)[chosen_idx]
sample_items = np.array(item_ids)[chosen_idx]

or even simply do the followings:甚至简单地执行以下操作:

sample_users = np.random.choice(user_ids, size=val_n, replace=False)
sample_items = np.random.choice(items_ids, size=val_n, replace=False)

Assuming the items need to be sampled with replacement, the following code will work:假设需要对物品进行更换抽样,以下代码将起作用:

import random

user_ids = [1,2,3,1,2]
item_ids = [8,9,10,5,8]
val_user_ids = sorted(set(user_ids))
val_item_ids = [random.choice(item_ids) for item in val_user_ids]

The set built-in returns a set (unique items) from an iterable like a list, and then the sorted built-in function returns a sorted list (if you don't need to sort, just use list(set(user_ids)) ). set 内置从一个可迭代的列表中返回一个集合(唯一项),然后 sorted 内置 function 返回一个排序列表(如果不需要排序,只需使用list(set(user_ids)) )。 The list comprehension then creates (usually more efficiently than a for loop in terms of execution speed) a new list with the items sampled from item_ids, with replacement.然后列表推导式创建(在执行速度方面通常比 for 循环更有效)一个新列表,其中包含从 item_ids 采样的项目,并进行替换。 One caveat: the user_id list needs to contain immutable items for this code to work (numbers are fine, so are strings, frozensets, and tuples as long as the tuple does not contain mutable structures like lists).一个警告:user_id 列表需要包含不可变项才能使此代码正常工作(数字很好,字符串、frozensets 和元组也可以,只要元组不包含列表等可变结构)。

If instead you need to sample without replacement, you could use:如果您需要在不更换的情况下进行采样,您可以使用:

import random

user_ids = [1,2,3 ,1, 2]
item_ids = [8,9,10,5,8]
val_user_ids = sorted(set(user_ids))
random.shuffle(item_ids)
val_item_ids = [item_ids.pop(i) for i in range(len(val_user_ids))]

The same caveat about sets applied (can't contain anything mutable).关于应用集的相同警告(不能包含任何可变的内容)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python,如何将元素从一个列表随机添加/追加到另一个列表? - Python, how to randomly add/append elements from one list to another? 根据python中的另一个列表匹配列表的元素,其中一个列表的元素是另一个列表的子字符串 - match element of a list based on another list in python where elements of one list is the substring of elements of another list 根据另一个列表中的元素从一个列表中删除元素 - Remove element from one list based on elements in another list Python:根据另一个列表中的元素删除列表元素 - Python: Deleting a list element based on elements from another list 从带有排除项的 Python 列表中随机抽取 N 个元素的最快方法 - Fastest way to randomly sample N elements from a Python list with exclusions 用另一个列表中的元素随机替换列表中的某些元素,python - randomly replacing certain elements in a list with elements from another list, python 根据另一个列表 python 计算列表中所有唯一元素的总和 - calculate sum of all unique elements in a list based on another list python 如何从列表中的python值中随机采样? - How to randomly sample from python values in a list? 将一个列表中随机选择的元素插入另一个列表中 - Insert a randomly chosen element of one list in another Python:根据另一个列表中的元素按索引从列表中删除元素 - Python: Removing elements from list by index based on elements in another list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM