Redis 中的高效查找表——使用 redis 集实现？

Question

I want to use redis to store a large set of user_ids and with each of these ids, a "group id" to which that user was previously assigned:我想使用 redis 来存储一大组 user_ids 和每个这些 id，一个“组 id”，该用户之前被分配到：

User_ID | Group_ID
   1043 | 2 
   2403 | 1

The number of user_ids is fairly large (~ 10 million); user_id 的数量相当大（~ 1000 万）； the number of unique group ids is about 3 - 5.唯一组 ID 的数量约为 3 - 5。

My purpose for this LuT is routine:我对这个 LuT 的目的是例行公事：

find the group id for a given user ;查找给定用户的组 ID ； and和
return a list of other users (of specified length) with the same group id as that given user返回与给定用户具有相同组 id 的其他用户列表（指定长度）

There might be an idiomatic way to do this in redis or at least a way that's most efficient.在 redis 中可能有一种惯用的方法，或者至少是一种最有效的方法。 If so i would like to know what it is.如果是这样，我想知道它是什么。 Here's a simplified version of my working implementation (using the python client):这是我的工作实现的简化版本（使用 python 客户端）：

# assume a redis server is already running 
# create some model data:
import numpy as NP
NUM_REG_USERS = 100
user_id = NP.random.randint(1000, 9999, NUM_REG_USERS)
cluster_id = NP.random.randint(1, 4, NUM_REG_USERS)
D = zip(cluster_id, user_id)

from redis import Redis
# r = Redis()

# populate the redis LuT:
for t in D :
    r.sadd( t[0], t[1] )

# the queries:
# is user_id 1034 in Group 1?
r.sismember("1", 1034)

# return 10 users in the same Group 1 as user_id 1034:
r.smembers("1")[:10]     # assume user_id 1034 is in group 1

So i have implemented this LuT using ordinary redis sets ;所以我使用普通的 redis集实现了这个 LuT； each set is keyed to a Group ID (1, 2, or 3), so there are three sets in total.每组都键入一个组 ID（1、2 或 3），因此总共有三组。

Is this the most efficient way store this data given the type of queries i want to run against it?考虑到我要针对它运行的查询类型，这是存储这些数据的最有效方式吗？

Answer 1

Using sets is a good basic approach, though there are a couple of things in there you may want to change:使用集合是一种很好的基本方法，尽管您可能想要更改其中的一些内容：

Unless you store the group ID for each a user somewhere you will need 5 round trips to get the group for a particular user - the operation itself is O(1), but you still need to consider latency.除非您将每个用户的组 ID 存储在某个地方，否则您将需要 5 次往返来获取特定用户的组 - 操作本身是 O(1)，但您仍然需要考虑延迟。 Usually it is fairly easy to do this without too much effort - you have lots of other properties stored for each user, so it is trivial to add one for group id.通常不需要太多努力就可以很容易地做到这一点 - 您为每个用户存储了许多其他属性，因此为组 id 添加一个是微不足道的。

You probably want SRANDMEMBER rather than SMEMBERS - I think SMEMBERS will return the same 10 items from your million item set every time.您可能想要 SRANDMEMBER 而不是 SMEMBERS - 我认为 SMEMBERS 每次都会从您的百万项目集中返回相同的 10 个项目。

Redis 中的高效查找表——使用 redis 集实现？

问题描述

1 个解决方案

解决方案1
1 已采纳 2011-07-12 02:30:06

Redis 中的高效查找表——使用 redis 集实现？

问题描述

1 个解决方案

解决方案1 1 已采纳 2011-07-12 02:30:06

解决方案1
1 已采纳 2011-07-12 02:30:06