简体   繁体   English

关注者/以下数据库结构

[英]Followers/following database structure

My website has a followers/following system (like Twitter's). 我的网站有一个粉丝/关注系统(如Twitter的)。 My dilemma is creating the database structure to handle who's following who. 我的困境是创建数据库结构来处理谁在追随谁。

What I came up with was creating a table like this: 我想出的是创建一个这样的表:

 id  |  user_id  |  followers |  following
  1  |    20     |  23,58,84  |  11,156,27
  2  |    21     |  72,35,14  |  6,98,44,12
 ... |   ...     |    ...     |     ...

Basically, I was thinking that each user would have a row with columns for their followers and the users they're following. 基本上,我认为每个用户都会有一行包含他们的关注者和他们关注的用户的列。 The followers and people they're following would have their user id's separated by commas. 他们关注的关注者和关注者的用户ID将以逗号分隔。

Is this an effective way of handling it? 这是一种有效的处理方式吗? If not, what's the best alternative? 如果没有,最好的选择是什么?

That's the worst way to do it. 这是最糟糕的做法。 It's against normalization. 这与正常化有关。 Have 2 seperate tables. 有2个单独的表。 Users and User_Followers. 用户和User_Followers。 Users will store user information. 用户将存储用户信息。 User_Followers will be like this: User_Followers将是这样的:

id | user_id | follower_id
1  | 20      | 45
2  | 20      | 53
3  | 32      | 20

User_Id and Follower_Id's will be foreign keys referring the Id column in the Users table. User_Id和Follower_Id将是引用Users表中Id列的外键。

There is a better physical structure than proposed by other answers so far: 到目前为止,有一个比其他答案提出的更好的物理结构:

CREATE TABLE follower (
    user_id INT, -- References user.
    follower_id INT,  -- References user.
    PRIMARY KEY (user_id, follower_id),
    UNIQUE INDEX (follower_id, user_id)
);

InnoDB tables are clustered , so the secondary indexes behave differently than in heap-based tables and can have unexpected overheads if you are not cognizant of that. InnoDB表是群集的 ,因此二级索引的行为与基于堆的表中的行为不同,如果您不了解它,可能会产生意外的开销。 Having a surrogate primary key id just adds another index for no good reason 1 and makes indexes on {user_id, follower_id} and {follower_id, user_id} fatter than they need to be (because secondary indexes in a clustered table implicitly include a copy of the PK). 拥有代理主键id只是添加另一个索引没有正当理由1并使{user_id,follower_id}和{follower_id,user_id}上的索引比它们需要的更胖(因为聚簇表中的二级索引隐含地包含一个副本PK)。

The table above has no surrogate key id and (assuming InnoDB) is physically represented by two B-Trees (one for the primary/clustering key and one for the secondary index), which is about as efficient as it gets for searching in both directions 2 . 上面的表没有代理键 id并且(假设InnoDB)由两个B-Trees(一个用于主/聚类键,一个用于二级索引)物理表示,这与在两个方向上搜索的效率大致相同。 2 If you only need one direction, you can abandon the secondary index and go down to just one B-Tree. 如果您只需要一个方向,则可以放弃二级索引并转到一个B树。

BTW what you did was a violation of the principle of atomicity , and therefore of 1NF. BTW你所做的是违反原子性原则,因此违反了1NF。


1 And every additional index takes space, lowers the cache effectiveness and impacts the INSERT/UPDATE/DELETE performance. 1每个附加索引占用空间,降低缓存效率并影响INSERT / UPDATE / DELETE性能。

2 From followee to follower and vice versa. 2从跟随者到跟随者,反之亦然。

One weakness of that representation is that each relationship is encoded twice: once in the row for the follower and once in the row for the following user, making it harder to maintain data integrity and updates tedious. 该表示的一个弱点是每个关系都被编码两次:一次在跟随者的行中,一次在下一个用户的行中,使得维护数据完整性和更新变得更加乏味。

I would make one table for users and one table for relationships. 我会为用户创建一个表,为关系创建一个表。 The relationship table would look like: 关系表看起来像:

id | follower | following
1  | 23       | 20
2  | 58       | 20
3  | 84       | 20
4  | 20       | 11
...

This way adding new relationships is simply an insert, and removing relationships is a delete. 这样添加新关系只是一个插入,删除关系就是删除。 It's also much easier to roll up the counts to determine how many followers a given user has. 汇总计数以确定给定用户拥有多少粉丝也更容易。

No, the approach you describe has a few problems. 不,你描述的方法有一些问题。

First, storing multiple data points as comma-separated strings has a number of issues. 首先,将多个数据点存储为以逗号分隔的字符串存在许多问题。 It's difficult to join on (and while you can join using like it will slow down performance) and difficult and slow to search on, and can't be indexed the way you would want. 这是很难参加对(虽然你可以加入使用like它会减慢性能)和困难和缓慢进行搜索,并且不能被索引,你会想要的方式。

Second, if you store both a list of followers and a list of people following, you have redundant data (the fact that A is following B will show up in two places), which is both a waste of space, and also creates the potential of data getting out-of-sync (if the database shows A on B's list of followers, but doesn't show B on A's list of following, then the data is inconsistent in a way that's very hard to recover from). 其次,如果你同时存储了一个关注者列表和一个关注人员列表,你就会有冗余数据(A跟随B的事实将出现在两个地方),这既浪费空间,又创造了潜力数据变得不同步(如果数据库在B的关注者列表中显示A,但未在A的跟踪列表中显示B,那么数据以一种很难从中恢复的方式不一致)。

Instead, use a join table. 而是使用连接表。 That's a separate table where each row has a user id and a follower id. 这是一个单独的表,其中每一行都有一个用户ID和一个跟随者ID。 This allows things to be stored in one place, allows indexing and joining, and also allows you to add additional columns to that row, for example to show when the following relationship started. 这允许将事物存储在一个位置,允许索引和连接,还允许您向该行添加其他列,例如显示以下关系何时开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM